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Preface 


This book provides an introduction to the Python programming language. Python is a 
popular open source programming language used for both standalone programs and 
scripting applications in a wide variety of domains. It is free, portable, powerful, and 
remarkably easy and fun to use. Programmers from every corner of the software in- 
dustry have found Python’s focus on developer productivity and software quality to be 
a strategic advantage in projects both large and small. 


Whether you are new to programming or are a professional developer, this book’s goal 
is to bring you quickly up to speed on the fundamentals of the core Python language. 
After reading this book, you will know enough about Python to apply it in whatever 
application domains you choose to explore. 


By design, this book is a tutorial that focuses on the core Python language itself, rather 
than specific applications of it. As such, it’s intended to serve as the first ina two-volume 
set: 


e Learning Python, this book, teaches Python itself. 


* Programming Python, among others, shows what you can do with Python after 
you've learned it. 


That is, applications-focused books such as Programming Python pick up where this 
book leaves off, exploring Python’s role in common domains suchas the Web, graphical 
user interfaces (GUIs), and databases. In addition, the book Python Pocket Reference 
provides additional reference materials not included here, and it is designed to sup- 
plement this book. 


Because of this book’s foundations focus, though, it is able to present Python funda- 
mentals with more depth than many programmers see when first learning the language. 
And because it’s based upon a three-day Python training class with quizzes and exer- 
cises throughout, this book serves as a self-paced introduction to the language. 
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About This Fourth Edition 


This fourth edition of this book has changed in three ways. This edition: 


e Covers both Python 3.0 and Python 2.6—it emphasizes 3.0, but notes differences 
in 2.6 


e Includes a set of new chapters mainly targeted at advanced core-language topics 


e Reorganizes some existing material and expands it with new examples for clarity 


As I write this edition in 2009, Python comes in two flavors—version 3.0 is an emerging 
and incompatible mutation of the language, and 2.6 retains backward compatibility 
with the vast body of existing Python code. Although Python 3 is viewed as the future 
of Python, Python 2 is still widely used and will be supported in parallel with Python 
3 for years to come. While 3.0 is largely the same language, it runs almost no code 
written for prior releases (the mutation of print from statement to function alone, 
aesthetically sound as it may be, breaks nearly every Python program ever written). 


This split presents a bit of a dilemma for both programmers and book authors. While 
it would be easier for a book to pretend that Python 2 never existed and cover 3 only, 
this would not address the needs of the large Python user base that exists today. A vast 
amount of existing code was written for Python 2, and it won’t be going away any time 
soon. And while newcomers to the language can focus on Python 3, anyone who must 
use code written in the past needs to keep one foot in the Python 2 world today. Since 
it may be years before all third-party libraries and extensions are ported to Python 3, 
this fork might not be entirely temporary. 


Coverage for Both 3.0 and 2.6 


To address this dichotomy and to meet the needs of all potential readers, this edition 
of this book has been updated to cover both Python 3.0 and Python 2.6 (and later 
releases in the 3.X and 2.X lines). It’s intended for programmers using Python 2, pro- 
grammers using Python 3, and programmers stuck somewhere between the two. 


That is, you can use this book to learn either Python line. Although the focus here is 
on 3.0 primarily, 2.6 differences and tools are also noted along the way for programmers 
using older code. While the two versions are largely the same, they diverge in some 
important ways, and I’ll point these out along the way. 


For instance, Pll use 3.0 print calls in most examples, but will describe the 2.6 print 
statement, too, so you can make sense of earlier code. Pll also freely introduce new 
features, such as the nonlocal statement in 3.0 and the string format method in 2.6 and 
3.0, and will point out when such extensions are not present in older Pythons. 


If you are learning Python for the first time and don’t need to use any legacy code, I 
encourage you to begin with Python 3.0; it cleans up some longstanding warts in the 
language, while retaining all the original core ideas and adding some nice new tools. 
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Many popular Python libraries and tools will likely be available for Python 3.0 by the 
time you read these words, especially given the file I/O performance improvements 
expected in the upcoming 3.1 release. If you are using a system based on Python 2.X, 
however, you'll find that this book addresses your concerns, too, and will help you 
migrate to 3.0 in the future. 


By proxy, this edition addresses other Python version 2 and 3 releases as well, though 
some older version 2.X code may not be able to run all the examples here. Although 
class decorators are available in both Python 2.6 and 3.0, for example, you cannot use 
them in an older Python 2.X that did not yet have this feature. See Tables P-1 and P-2 
later in this Preface for summaries of 2.6 and 3.0 changes. 


Va 

] Shortly before going to press, this book was also augmented with notes 

about prominent extensions in the upcoming Python 3.1 release— 

comma separators and automatic field numbering in string format 

` method calls, multiple context manager syntax in with statements, new 
methods for numbers, and so on. Because Python 3.1 was targeted pri- 
marily at optimization, this book applies directly to this new release as 
well. In fact, because Python 3.1 supersedes 3.0, and because the latest 
Python is usually the best Python to fetch and use anyhow, in this book 
the term “Python 3.0” generally refers to the language variations intro- 
duced by Python 3.0 but that are present in the entire 3.X line. 


New Chapters 


Although the main purpose of this edition is to update the examples and material from 
the preceding edition for 3.0 and 2.6, I’ve also added five new chapters to address new 
topics and add context: 


e Chapter 27 is a new class tutorial, using a more realistic example to explore the 
basics of Python object-oriented programming (OOP). 

e Chapter 36 provides details on Unicode and byte strings and outlines string and 
file differences between 3.0 and 2.6. 

e Chapter 37 collects managed attribute tools such as properties and provides new 
coverage of descriptors. 

e Chapter 38 presents function and class decorators and works through compre- 
hensive examples. 


e Chapter 39 covers metaclasses and compares and contrasts them with decorators. 


The first of these chapters provides a gradual, step-by-step tutorial for using classes and 
OOP in Python. It’s based upon a live demonstration I have been using in recent years 
in the training classes I teach, but has been honed here for use in a book. The chapter 
is designed to show OOP in a more realistic context than earlier examples and to 
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illustrate how class concepts come together into larger, working programs. I hope it 
works as well here as it has in live classes. 


The last four of these new chapters are collected in a new final part of the book, “Ad- 
vanced Topics.” Although these are technically core language topics, not every Python 
programmer needs to delve into the details of Unicode text or metaclasses. Because of 
this, these four chapters have been separated out into this new part, and are officially 
optional reading. The details of Unicode and binary data strings, for example, have been 
moved to this final part because most programmers use simple ASCII strings and don’t 
need to know about these topics. Similarly, decorators and metaclasses are specialist 
topics that are usually of more interest to API builders than application programmers. 


If you do use such tools, though, or use code that does, these new advanced topic 
chapters should help you master the basics. In addition, these chapters’ examples in- 
clude case studies that tie core language concepts together, and they are more sub- 
stantial than those in most of the rest of the book. Because this new part is optional 
reading, it has end-of-chapter quizzes but no end-of-part exercises. 


Changes to Existing Material 


In addition, some material from the prior edition has been reorganized, or supplemen- 
ted with new examples. Multiple inheritance, for instance, gets a new case study ex- 
ample that lists class trees in Chapter 30; new examples for generators that manually 
implement map and zip are provided in Chapter 20; static and class methods are illus- 
trated by new code in Chapter 31; package relative imports are captured in action in 
Chapter 23; and the contains, bool _,and__index__ operator overloading meth- 
ods are illustrated by example now as well in Chapter 29, along with the new 
overloading protocols for slicing and comparison. 


This edition also incorporates some reorganization for clarity. For instance, to accom- 
modate new material and topics, and to avoid chapter topic overload, five prior chapters 
have been split into two each here. The result is new standalone chapters on operator 
overloading, scopes and arguments, exception statement details, and comprehension 
and iteration topics. Some reordering has been done within the existing chapters as 
well, to improve topic flow. 


This edition also tries to minimize forward references with some reordering, though 
Python 3.0’s changes make this impossible in some cases: to understand printing and 
the string format method, you now must know keyword arguments for functions; to 
understand dictionary key lists and key tests, you must now know iteration; to use 
exec to run code, you need to be able to use file objects; and so on. A linear reading 
still probably makes the most sense, but some topics may require nonlinear jumps and 
random lookups. 


All told, there have been hundreds of changes in this edition. The next section’s tables 
alone document 27 additions and 57 changes in Python. In fact, it’s fair to say that this 
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edition is somewhat more advanced, because Python is somewhat more advanced. As 
for Python 3.0 itself, though, you’re probably better off discovering most of this book’s 
changes for yourself, rather than reading about them further in this Preface. 


Specific Language Extensions in 2.6 and 3.0 


In general, Python 3.0 is a cleaner language, but it is also in some ways a more sophis- 
ticated language. In fact, some of its changes seem to assume you must already know 
Python in order to learn Python! The prior section outlined some of the more prominent 
circular knowledge dependencies in 3.0; as a random example, the rationale for wrap- 
ping dictionary views in a list call is incredibly subtle and requires substantial fore- 
knowledge. Besides teaching Python fundamentals, this book serves to help bridge this 
knowledge gap. 


Table P-1 lists the most prominent new language features covered in this edition, along 
with the primary chapters in which they appear. 


Table P-1. Extensions in Python 2.6 and 3.0 


Extension Covered in chapter(s) 
The print function in 3.0 11 

The nonlocal x,y statement in 3.0 17 

The str. format method in 2.6 and 3.0 7 

String types in 3.0: str for Unicode text, bytes for binary data 7,36 

Text and binary file distinctions in 3.0 9,36 

Class decorators in 2.6 and 3.0: @Qprivate('age') 31,38 
New iterators in 3.0: range, map, zip 14,20 
Dictionary views in 3.0: D. keys, D. values, D. items 8,14 
Division operators in 3.0: remainders, / and // 5 

Set literals in 3.0: {a, b, c} 5 

Set comprehensions in 3.0: {x**2 for x in seq} 4,5, 14, 20 
Dictionary comprehensions in 3.0: {x: x**2 for x in seq} 4,8, 14, 20 
Binary digit-string support in 2.6 and 3.0: 0b0101, bin(I) 5 

The fraction number type in 2.6 and 3.0: Fraction(1, 3) 5 

Function annotations in 3.0:def f(a:99, b:str)->int 19 
Keyword-only arguments in 3.0:def f(a, *b, c, **d) 18, 20 
Extended sequence unpacking in 3.0:a, *b = seq 113 
Relative import syntax for packages enabled in 3.0: from . 23 

Context managers enabled in 2.6 and 3.0: with/as 33,35 
Exception syntax changes in 3.0: raise, except/as, superclass 33,34 
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Extension 


Exception chaining in 3.0: raise e2 from e1 


Reserved word changes in 2.6 and 3.0 


New-style class cutover in 3.0 


Property decorators in 2.6 and 3.0: @property 


Descriptor use in 2.6 and 3.0 


Metaclass use in 2.6 and 3.0 


Abstract base classes support in 2.6 and 3.0 


Covered in chapter(s) 
33 


Specific Language Removals in 3.0 


In addition to extensions, a number of language tools have been removed in 3.0 in an 
effort to clean up its design. Table P-2 summarizes the changes that impact this book, 
covered in various chapters of this edition. Many of the removals listed in Table P-2 
have direct replacements, some of which are also available in 2.6 to support future 


migration to 3.0. 


Table P-2. Removals in Python 3.0 that impact this book 


Removed 

reload(M) 

apply(f, ps, ks) 
‘ye 

X <> Y 

long 

9999L 

D.has_key(K) 
raw_input 

old input 

xrange 

file 

X.next 
X.__getslice _ 
X.__setslice _ 
reduce 
execfile(filename) 
exec open(filename) 
0777 


print x, y 


Replacement 
imp.reload(M) (or exec) 

f(*ps, **ks) 

repr (X) 

X l= Y 

int 

9999 

K in D(orD.get(key) != None) 
input 

eval(input()) 

range 

open (and io module classes) 
X.__next_, called by next (X) 
X.__getitem__passeda slice object 
X.__setitem__passeda slice object 
functools. reduce (or loop code) 
exec(open(filename) .read() ) 
exec(open(filename) .read() ) 
00777 

print(x, y) 


Covered in chapter(s) 
3,22 


14, 20, 29 
7,29 
7,29 
14,19 
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Removed 

print >> F, x, y 

print x, y, 

u'ccc' 

"bbb' for byte strings 

raise E, V 

except E, X: 

def f((a, b)): 
file.xreadlines 
D.keys(), etc. as lists 
map(), range(), etc. as lists 
map(None, ...) 
X=D.keys(); X.sort() 
cmp(x, y) 

X.__cmp__(y) 

X.__ nonzero __ 

X. hex_, X.__oct__ 
Sort comparison functions 
Dictionary <, >, <=, >= 
types.ListType 
__metaclass__ = M 

_ builtin 

Tkinter 

sys.exc_type, exc_value 
function. func_code 
__getattr__run by built-ins 
-t, -tt command-line switches 
from ... *, within a function 
import mod, in same package 
class MyException: 
exceptions module 
thread, Queue modules 
anydbm module 

cPickle module 

os .popen2/3/4 


String-based exceptions 


Replacement 

print(x, y, file=F) 

print(x, y, end=' ') 

"ccc' 

b'bbb' 

raise E(V) 

except E as X: 

def f(x): (a, b) = x 

for line in file: (orX=iter(file)) 
list (D.keys()) (dictionary views) 
list(map()), list (range () ) (built-ins) 
zip (or manual code to pad results) 
sorted(D) (or list (D.keys())) 

(x > y) - (x < y) 

_ilt_,_gt_, eq_,etc 

X. bool _ 

X._index__ 

Use key=transform or reverse=True 
Compare sorted(D.items()) (or loop code) 
list (types is for nonbuilt-in names only) 
class C(metaclass=M): 

builtins (renamed) 

tkinter (renamed) 
sys.exc_info()[0], [1] 

function. code __ 

Redefine __X___ methods in wrapper classes 
Inconsistent tabs/spaces use is always an error 
May only appear at the top level of a file 

from . import mod, package-relative form 
class MyException(Exception): 
Built-in scope, library manual 

_ thread, queue (both renamed) 

dbm (renamed) 

_pickle (renamed, used automatically) 
subprocess . Popen (os . popen retained) 


Class-based exceptions (also required in 2.6) 


Covered in chapter(s) 
11 

11 

7,36 
7,9, 36 
32, 33, 34 
32, 33, 34 
11, 18, 20 
13,14 
8,14 

14 
13, 20 
4,8, 14 
29 

29 

29 

29 

8 

8,9 

9 

28, 31, 39 
17 

18, 19, 24, 29,30 
34,35 
19,38 
30, 37, 38 
10, 12 
22 

2B 


32, 33, 34 
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Removed Replacement Covered in chapter(s) 


String module functions String object methods 7 
Unbound methods Functions (staticmethod to call via instance) 30, 31 
Mixed type comparisons, sorts Nonnumeric mixed type comparisons are errors 5,9 


There are additional changes in Python 3.0 that are not listed in this table, simply 
because they don’t affect this book. Changes in the standard library, for instance, might 
have a larger impact on applications-focused books like Programming Python than they 
do here; although most standard library functionality is still present, Python 3.0 takes 
further liberties with renaming modules, grouping them into packages, and so on. For 
a more comprehensive list of changes in 3.0, see the “What’s New in Python 3.0” 
document in Python’s standard manual set. 


If you are migrating from Python 2.X to Python 3.X, be sure to also see the 2to3 auto- 
matic code conversion script that is available with Python 3.0. It can’t translate every- 
thing, but it does a reasonable job of converting the majority of 2.X code to run under 
3.X. As I write this, a new 3to2 back-conversion project is also underway to translate 
Python 3.X code to run in 2.X environments. Either tool may prove useful if you must 
maintain code for both Python lines; see the Web for details. 


Because this fourth edition is mostly a fairly straightforward update for 3.0 with a 
handful of new chapters, and because it’s only been two years since the prior edition 
was published, the rest of this Preface is taken from the prior edition with only minor 
updating. 


About The Third Edition 


In the four years between the publication of the second and third editions of this book 
there were substantial changes in Python itself, and in the topics I presented in Python 
training sessions. The third edition reflected these changes, and also incorporated a 
handful of structural changes. 


The Third Edition’s Python Language Changes 


On the language front, the third edition was thoroughly updated to reflect Python 2.5 
and all changes to the language since the publication of the second edition in late 2003. 
(The second edition was based largely on Python 2.2, with some 2.3 features grafted 
on at the end of the project.) In addition, discussions of anticipated changes in the 
upcoming Python 3.0 release were incorporated where appropriate. Here are some of 
the major language topics for which new or expanded coverage was provided (chapter 
numbers here have been updated to reflect the fourth edition): 
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e The new B if A else C conditional expression (Chapter 19) 

e with/as context managers (Chapter 33) 

e try/except/finally unification (Chapter 33) 

e Relative import syntax (Chapter 23) 

e Generator expressions (Chapter 20) 

e New generator function features (Chapter 20) 

e Function decorators (Chapter 31) 

e The set object type (Chapter 5) 

e New built-in functions: sorted, sum, any, all, enumerate (Chapters 13 and 14) 

e The decimal fixed-precision object type (Chapter 5) 

e Files, list comprehensions, and iterators (Chapters 14 and 20) 

e New development tools: Eclipse, distutils, unittest and doctest, IDLE enhance- 
ments, Shedskin, and so on (Chapters 2 and 35) 


Smaller language changes (for instance, the widespread use of True and False; the new 
sys.exc_info for fetching exception details; and the demise of string-based exceptions, 
string methods, and the apply and reduce built-ins) are discussed throughout the book. 
The third edition also expanded coverage of some of the features that were new in the 
second edition, including three-limit slices and the arbitrary arguments call syntax that 
subsumed apply. 


The Third Edition’s Python Training Changes 


Besides such language changes, the third edition was augmented with new topics and 
examples presented in my Python training sessions. Changes included (chapter num- 
bers again updated to reflect those in the fourth edition): 


e Anew chapter introducing built-in types (Chapter 4) 

e Anew chapter introducing statement syntax (Chapter 10) 

e Anew full chapter on dynamic typing, with enhanced coverage (Chapter 6) 

e An expanded OOP introduction (Chapter 25) 

e New examples for files, scopes, statement nesting, classes, exceptions, and more 
Many additions and changes were made with Python beginners in mind, and some 
topics were moved to appear at the places where they proved simplest to digest in 
training classes. List comprehensions and iterators, for example, now make their initial 


appearance in conjunction with the for loop statement, instead of later with functional 
tools. 
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Coverage of many original core language topics also was substantially expanded in the 
third edition, with new discussions and examples added. Because this text has become 
something of a de facto standard resource for learning the core Python language, the 
presentation was made more complete and augmented with new use cases throughout. 


In addition, a new set of Python tips and tricks, gleaned from 10 years of teaching classes 
and 15 years of using Python for real work, was incorporated, and the exercises were 
updated and expanded to reflect current Python best practices, new language features, 
and common beginners’ mistakes witnessed firsthand in classes. Overall, the core lan- 
guage coverage was expanded. 


The Third Edition’s Structural Changes 


Because the material was more complete, it was split into bite-sized chunks. The core 
language material was organized into many multichapter parts to make it easier to 
tackle. Types and statements, for instance, are now two top-level parts, with one chap- 
ter for each major type and statement topic. Exercises and “gotchas” (common mis- 
takes) were also moved from chapter ends to part ends, appearing at the end of the last 
chapter in each part. 


In the third edition, I also augmented the end-of-part exercises with end-of-chapter 
summaries and end-of-chapter quizzes to help you review chapters as you complete 
them. Each chapter concludes with a set of questions to help you review and test your 
understanding of the chapter’s material. Unlike the end-of-part exercises, whose solu- 
tions are presented in Appendix B, the solutions to the end-of-chapter quizzes appear 
immediately after the questions; I encourage you to look at the solutions even if you’re 
sure you’ve answered the questions correctly because the answers are a sort of review 
in themselves. 


Despite all the new topics, the book is still oriented toward Python newcomers and is 
designed to be a first Python text for programmers. Because it is largely based on time- 
tested training experience and materials, it can still serve as a self-paced introductory 
Python class. 


The Third Edition’s Scope Changes 


As of its third edition, this book is intended as a tutorial on the core Python language, 
and nothing else. It’s about learning the language in an in-depth fashion, before ap- 
plying it in application-level programming. The presentation here is bottom-up and 
gradual, but it provides a complete look at the entire language, in isolation from its 
application roles. 


For some, “learning Python” involves spending an hour or two going through a tutorial 
on the Web. This works for already advanced programmers, up to a point; Python is, 
after all, relatively simple in comparison to other languages. The problem with this fast- 
track approach is that its practitioners eventually stumble onto unusual cases and get 
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stuck—variables change out from under them, mutable default arguments mutate in- 
explicably, and so on. The goal here is instead to provide a solid grounding in Python 
fundamentals, so that even the unusual cases will make sense when they crop up. 


This scope is deliberate. By restricting our gaze to language fundamentals, we can in- 
vestigate them here in more satisfying depth. Other texts, described ahead, pick up 
where this book leaves off and provide a more complete look at application-level topics 
and additional reference materials. The purpose of the book you are reading now is 
solely to teach Python itself so that you can apply it to whatever domain you happen 
to work in. 


About This Book 


This section underscores some important points about this book in general, regardless 
of its edition number. No book addresses every possible audience, so it’s important to 
understand a book’s goals up front. 


This Book’s Prerequisites 


There are no absolute prerequisites to speak of, really. Both true beginners and crusty 
programming veterans have used this book successfully. If you are motivated to learn 
Python, this text will probably work for you. In general, though, I have found that any 
exposure to programming or scripting before this book can be helpful, even if not 
required for every reader. 


This book is designed to be an introductory-level Python text for programmers.’ It may 
not be an ideal text for someone who has never touched a computer before (for instance, 
we're not going to spend any time exploring what a computer is), but I haven’t made 
many assumptions about your programming background or education. 


On the other hand, I won’t insult readers by assuming they are “dummies,” either, 
whatever that means—it’s easy to do useful things in Python, and this book will show 
you how. The text occasionally contrasts Python with languages such as C, C++, Java, 
and Pascal, but you can safely ignore these comparisons if you haven’t used such lan- 
guages in the past. 


This Book’s Scope and Other Books 


Although this book covers all the essentials of the Python language, I’ve kept its scope 
narrow in the interests of speed and size. To keep things simple, this book focuses on 
core concepts, uses small and self-contained examples to illustrate points, and 


* And by “programmers,” I mean anyone who has written a single line of code in any programming or scripting 
language in the past. If this doesn’t include you, you will probably find this book useful anyhow, but be aware 
that it will spend more time teaching Python than programming fundamentals. 
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sometimes omits the small details that are readily available in reference manuals. Be- 
cause of that, this book is probably best described as an introduction and a stepping- 
stone to more advanced and complete texts. 


For example, we won’t talk much about Python/C integration—a complex topic that 
is nevertheless central to many Python-based systems. We also won’t talk much about 
Python’s history or development processes. And popular Python applications such as 
GUIs, system tools, and network scripting get only a short glance, if they are mentioned 
at all. Naturally, this scope misses some of the big picture. 


By and large, Python is about raising the quality bar a few notches in the scripting world. 
Some of its ideas require more context than can be provided here, and I'd be remiss if 
I didn’t recommend further study after you finish this book. I hope that most readers 
of this book will eventually go on to gain a more complete understanding of application- 
level programming from other texts. 


Because of its beginner’s focus, Learning Python is designed to be naturally comple- 
mented by O’Reilly’s other Python books. For instance, Programming Python, another 
book I authored, provides larger and more complete examples, along with tutorials on 
application programming techniques, and was explicitly designed to be a follow-up 
text to the one you are reading now. Roughly, the current editions of Learning 
Python and Programming Python reflect the two halves of their author’s training 
materials—the core language, and application programming. In addition, O’Reilly’s 
Python Pocket Reference serves as a quick reference supplement for looking up some 
of the finer details skipped here. 


Other follow-up books can also provide references, additional examples, or details 
about using Python in specific domains such as the Web and GUIs. For instance, 
O’Reilly’s Python in a Nutshell and Sams’s Python Essential Reference serve as useful 
references, and O’Reilly’s Python Cookbook offers a library of self-contained examples 
for people already familiar with application programming techniques. Because reading 
books is such a subjective experience, I encourage you to browse on your own to find 
advanced texts that suit your needs. Regardless of which books you choose, though, 
keep in mind that the rest of the Python story requires studying examples that are more 
realistic than there is space for here. 


Having said that, I think you’ll find this book to be a good first text on Python, despite 
its limited scope (and perhaps because of it). You’ll learn everything you need to get 
started writing useful standalone Python programs and scripts. By the time you’ve fin- 
ished this book, you will have learned not only the language itself, but also how to apply 
it well to your day-to-day tasks. And you'll be equipped to tackle more advanced topics 
and examples as they come your way. 
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This Book's Style and Structure 


This book is based on training materials developed for a three-day hands-on Python 
course. You'll find quizzes at the end of each chapter, and exercises at the end of the 
last chapter of each part. Solutions to chapter quizzes appear in the chapters themselves, 
and solutions to part exercises show up in Appendix B. The quizzes are designed to 
review material, while the exercises are designed to get you coding right away and are 
usually one of the highlights of the course. 


I strongly recommend working through the quizzes and exercises along the way, not 
only to gain Python programming experience, but also because some of the exercises 
raise issues not covered elsewhere in the book. The solutions in the chapters and in 
Appendix B should help you if you get stuck (and you are encouraged to peek at the 
answers as much and as often as you like). 


The overall structure of this book is also derived from class materials. Because this text 
is designed to introduce language basics quickly, lve organized the presentation by 
major language features, not examples. We’ll take a bottom-up approach here: from 
built-in object types, to statements, to program units, and so on. Each chapter is fairly 
self-contained, but later chapters draw upon ideas introduced in earlier ones (e.g., by 
the time we get to classes, Pll assume you know how to write functions), so a linear 
reading makes the most sense for most readers. 


In general terms, this book presents the Python language in a linear fashion. It is or- 
ganized with one part per major language feature—types, functions, and so forth—and 
most of the examples are small and self-contained (some might also call the examples 
in this text artificial, but they illustrate the points it aims to make). More specifically, 
here is what you will find: 


Part I, Getting Started 

We begin with a general overview of Python that answers commonly asked initial 
questions—why people use the language, what it’s useful for, and so on. The first 
chapter introduces the major ideas underlying the technology to give you some 
background context. Then the technical material of the book begins, as we explore 
the ways that both we and Python run programs. The goal of this part of the book 
is to give you just enough information to be able to follow along with later examples 
and exercises. 


Part II, Types and Operations 
Next, we begin our tour of the Python language, studying Python’s major built-in 
object types in depth: numbers, lists, dictionaries, and so on. You can geta lot done 
in Python with these tools alone. This is the most substantial part of the book 
because we lay groundwork here for later chapters. We’ll also look at dynamic 
typing and its references—keys to using Python well—in this part. 
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Part III, Statements and Syntax 
The next part moves on to introduce Python’s statements—the code you type to 
create and process objects in Python. It also presents Python’s general syntax 
model. Although this part focuses on syntax, it also introduces some related tools, 
such as the PyDoc system, and explores coding alternatives. 

Part IV, Functions 
This part begins our look at Python’s higher-level program structure tools. Func- 
tions turn out to be a simple way to package code for reuse and avoid code redun- 
dancy. In this part, we will explore Python’s scoping rules, argument-passing 
techniques, and more. 


Part V, Modules 
Python modules let you organize statements and functions into larger components, 
and this part illustrates how to create, use, and reload modules. We’ll also look at 
some more advanced topics here, such as module packages, module reloading, and 
the __name__ variable. 


Part VI, Classes and OOP 
Here, we explore Python’s object-oriented programming tool, the class—an op- 
tional but powerful way to structure code for customization and reuse. As you'll 
see, classes mostly reuse ideas we will have covered by this point in the book, and 
OOP in Python is mostly about looking up names in linked objects. As you'll also 
see, OOP is optional in Python, but it can shave development time substantially, 
especially for long-term strategic project development. 


Part VII, Exceptions and Tools 
We conclude the language fundamentals coverage in this text with a look at Py- 
thon’s exception handling model and statements, plus a brief overview of devel- 
opment tools that will become more useful when you start writing larger programs 
(debugging and testing tools, for instance). Although exceptions are a fairly light- 
weight tool, this part appears after the discussion of classes because exceptions 
should now all be classes. 


Part VIII, Advanced Topics (new in the fourth edition) 

In the final part, we explore some advanced topics. Here, we study Unicode and 
byte strings, managed attribute tools like properties and descriptors, function and 
class decorators, and metaclasses. These chapters are all optional reading, because 
not all programmers need to understand the subjects they address. On the other 
hand, readers who must process internationalized text or binary data, or are re- 
sponsible for developing APIs for other programmers to use, should find something 
of interest in this part. 


Part IX, Appendixes 
The book wraps up with a pair of appendixes that give platform-specific tips for 
using Python on various computers (Appendix A) and provide solutions to the end- 
of-part exercises (Appendix B). Solutions to end-of-chapter quizzes appear in the 
chapters themselves. 
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Note that the index and table of contents can be used to hunt for details, but there are 
no reference appendixes in this book (this book is a tutorial, not a reference). As men- 
tioned earlier, you can consult Python Pocket Reference, as well as other books, and the 
free Python reference manuals maintained at http://www.python.org for syntax and 
built-in tool details. 


Book Updates 


Improvements happen (and so do mis^H^H^H typos). Updates, supplements, and cor- 
rections for this book will be maintained (or referenced) on the Web at one of the 
following sites: 


http://www.oreilly.com/catalog/9 780596158064 (O’Reilly’s web page for the book) 
http://www.rmi.net/~lutz (the author’s site) 
http://www.rmi.net/~lutz/about-lp.html (the author’s web page for the book) 


The last of these three URLs points to a web page for this book where I will post updates, 
but be sure to search the Web if this link becomes invalid. If I could become more 
clairvoyant, I would, but the Web changes faster than printed books. 


About the Programs in This Book 


This fourth edition of this book, and all the program examples in it, is based on Python 
version 3.0. In addition, most of its examples run under Python 2.6, as described in the 
text, and notes for Python 2.6 readers are mixed in along the way. 


Because this text focuses on the core language, however, you can be fairly sure that 
most of what it has to say won’t change very much in future releases of Python. Most 
of this book applies to earlier Python versions, too, except when it does not; naturally, 
if you try using extensions added after the release you’ve got, all bets are off. 


As a rule of thumb, the latest Python is the best Python. Because this book focuses on 
the core language, most of it also applies to Jython, the Java-based Python language 
implementation, as well as other Python implementations described in Chapter 2. 


Source code for the book’s examples, as well as exercise solutions, can be fetched from 
the book’s website at http://www.oreilly.com/catalog/9780596158064/. So, how do you 
run the examples? We’ll study startup details in Chapter 3, so please stay tuned for 
information on this front. 


Using Code Examples 


This book is here to help you get your job done. In general, you may use the code in 
this book in your programs and documentation. You do not need to contact us for 
permission unless you’re reproducing a significant portion of the code. For example, 
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writing a program that uses several chunks of code from this book does not require 
permission. Selling or distributing a CD-ROM of examples from O’Reilly books does 
require permission. Answering a question by citing this book and quoting example 
code does not require permission. Incorporating a significant amount of example code 
from this book into your product’s documentation does require permission. 


We appreciate, but do not require, attribution. An attribution usually includes the title, 
author, publisher, and ISBN. For example: “Learning Python, Fourth Edition, by Mark 
Lutz. Copyright 2009 Mark Lutz, 978-0-596-15806-4.” 


If you feel your use of code examples falls outside fair use or the permission given above, 
feel free to contact us at permissions@oreilly.com. 


Font Conventions 


This book uses the following typographical conventions: 


Italic 
Used for email addresses, URLs, filenames, pathnames, and emphasizing new 
terms when they are first introduced 

Constant width 
Used for the contents of files and the output from commands, and to designate 
modules, methods, statements, and commands 

Constant width bold 
Used in code sections to show commands or text that would be typed by the user, 
and, occasionally, to highlight portions of code 

Constant width italic 
Used for replaceables and some comments in code sections 

<Constant width> 
Indicates a syntactic unit that should be replaced with real code 


Indicates a tip, suggestion, or general note relating to the nearby text. 


Indicates a warning or caution relating to the nearby text. 
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Notes specific to this book: In this book’s examples, the % character at 
the start of a system command line stands for the system’s prompt, 
& whatever that may be on your machine (e.g., C:\Python30> in a DOS 
` window). Don’t type the % character (or the system prompt it sometimes 
stands for) yourself. 


Similarly, in interpreter interaction listings, do not type the >>> 
and ... characters shown at the start of lines—these are prompts that 
Python displays. Type just the text after these prompts. To help you 
remember this, user inputs are shown in bold font in this book. 

Also, you normally don’t need to type text that starts with a # in listings; 
as you'll learn, these are comments, not executable code. 


Safari® Books Online 


Saf Safari Books Online is an on-demand digital library that lets you easily 
Alar search over 7,500 technology and creative reference books and videos to 
find the answers you need quickly. 


Witha subscription, you can read any page and watch any video from our library online. 
Read books on your cell phone and mobile devices. Access new titles before they are 
available for print, and get exclusive access to manuscripts in development and post 
feedback for the authors. Copy and paste code samples, organize your favorites, down- 
load chapters, bookmark key sections, create notes, print out pages, and benefit from 
tons of other time-saving features. 


O’Reilly Media has uploaded this book to the Safari Books Online service. To have full 
digital access to this book and others on similar topics from O’Reilly and other pub- 
lishers, sign up for free at http://my.safaribooksonline.com. 


How to Contact Us 
Please address comments and questions concerning this book to the publisher: 


O’Reilly Media, Inc. 

1005 Gravenstein Highway North 

Sebastopol, CA 95472 

800-998-9938 (in the United States or Canada) 
707-829-0515 (international or local) 
707-829-0104 (fax) 


We will also maintain a web page for this book, where we list errata, examples, and 
any additional information. You can access this page at: 


http://www.oreilly.com/catalog/9780596158064/ 
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To comment or ask technical questions about this book, send email to: 
bookquestions@oreilly.com 


For more information about our books, conferences, Resource Centers, and the 
O’Reilly Network, see our website at: 


http://www.oreilly.com 


For book updates, be sure to also see the other links mentioned earlier in this Preface. 
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Getting Started 


CHAPTER 1 


A Python Q&A Session 


If you’ve bought this book, you may already know what Python is and why it’s an 
important tool to learn. If you don’t, you probably won’t be sold on Python until you’ve 
learned the language by reading the rest of this book and have done a project or two. 
But before we jump into details, the first few pages of this book will briefly introduce 
some of the main reasons behind Python’s popularity. To begin sculpting a definition 
of Python, this chapter takes the form of a question-and-answer session, which poses 
some of the most common questions asked by beginners. 


Why Do People Use Python? 


Because there are many programming languages available today, this is the usual first 
question of newcomers. Given that there are roughly 1 million Python users out there 
at the moment, there really is no way to answer this question with complete accuracy; 
the choice of development tools is sometimes based on unique constraints or personal 
preference. 


But after teaching Python to roughly 225 groups and over 3,000 students during the 
last 12 years, some common themes have emerged. The primary factors cited by Python 
users seem to be these: 


Software quality 
For many, Python’s focus on readability, coherence, and software quality in general 
sets it apart from other tools in the scripting world. Python code is designed to be 
readable, and hence reusable and maintainable—much more so than traditional 
scripting languages. The uniformity of Python code makes it easy to understand, 
even if you did not write it. In addition, Python has deep support for more advanced 
software reuse mechanisms, such as object-oriented programming (OOP). 


Developer productivity 
Python boosts developer productivity many times beyond compiled or statically 
typed languages such as C, C++, and Java. Python code is typically one-third to 
one-fifth the size of equivalent C++ or Java code. That means there is less to type, 


less to debug, and less to maintain after the fact. Python programs also run imme- 
diately, without the lengthy compile and link steps required by some other tools, 
further boosting programmer speed. 


Program portability 

Most Python programs run unchanged on all major computer platforms. Porting 
Python code between Linux and Windows, for example, is usually just a matter of 
copying a script’s code between machines. Moreover, Python offers multiple op- 
tions for coding portable graphical user interfaces, database access programs, web- 
based systems, and more. Even operating system interfaces, including program 
launches and directory processing, are as portable in Python as they can possibly 
be. 


Support libraries 

Python comes with a large collection of prebuilt and portable functionality, known 
as the standard library. This library supports an array of application-level pro- 
gramming tasks, from text pattern matching to network scripting. In addition, 
Python can be extended with both homegrown libraries and a vast collection of 
third-party application support software. Python’s third-party domain offers tools 
for website construction, numeric programming, serial port access, game devel- 
opment, and much more. The NumPy extension, for instance, has been described 
as a free and more powerful equivalent to the Matlab numeric programming 
system. 


Component integration 
Python scripts can easily communicate with other parts of an application, using a 
variety of integration mechanisms. Such integrations allow Python to be used as a 
product customization and extension tool. Today, Python code can invoke C and 
C++ libraries, can be called from C and C++ programs, can integrate with Java 
and .NET components, can communicate over frameworks such as COM, can 
interface with devices over serial ports, and can interact over networks with inter- 
faces like SOAP, XML-RPC, and CORBA. It is not a standalone tool. 

Enjoyment 
Because of Python’s ease of use and built-in toolset, it can make the act of pro- 
gramming more pleasure than chore. Although this may be an intangible benefit, 
its effect on productivity is an important asset. 


Of these factors, the first two (quality and productivity) are probably the most com- 
pelling benefits to most Python users. 


Software Quality 


By design, Python implements a deliberately simple and readable syntax and a highly 
coherent programming model. As a slogan at a recent Python conference attests, the 
net result is that Python seems to “fit your brain”—that is, features of the language 
interact in consistent and limited ways and follow naturally from a small set of core 
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concepts. This makes the language easier to learn, understand, and remember. In prac- 
tice, Python programmers do not need to constantly refer to manuals when reading or 
writing code; it’s a consistently designed system that many find yields surprisingly 
regular-looking code. 


By philosophy, Python adopts a somewhat minimalist approach. This means that al- 
though there are usually multiple ways to accomplish a coding task, there is usually 
just one obvious way, a few less obvious alternatives, and a small set of coherent in- 
teractions everywhere in the language. Moreover, Python doesn’t make arbitrary deci- 
sions for you; when interactions are ambiguous, explicit intervention is preferred over 
“magic.” In the Python way of thinking, explicit is better than implicit, and simple is 
better than complex.” 


Beyond such design themes, Python includes tools such as modules and OOP that 
naturally promote code reusability. And because Python is focused on quality, so too, 
naturally, are Python programmers. 


Developer Productivity 


During the great Internet boom of the mid-to-late 1990s, it was difficult to find enough 
programmers to implement software projects; developers were asked to implement 
systems as fast as the Internet evolved. Today, in an era of layoffs and economic reces- 
sion, the picture has shifted. Programming staffs are often now asked to accomplish 
the same tasks with even fewer people. 


In both of these scenarios, Python has shined as a tool that allows programmers to get 
more done with less effort. It is deliberately optimized for speed of development—its 
simple syntax, dynamic typing, lack of compile steps, and built-in toolset allow pro- 
grammers to develop programs in a fraction of the time needed when using some other 
tools. The net effect is that Python typically boosts developer productivity many times 
beyond the levels supported by traditional languages. That’s good news in both boom 
and bust times, and everywhere the software industry goes in between. 


Is Python a “Scripting Language”? 


Python is a general-purpose programming language that is often applied in scripting 
roles. It is commonly defined as an object-oriented scripting language—a definition that 
blends support for OOP with an overall orientation toward scripting roles. In fact, 
people often use the word “script” instead of “program” to describe a Python code file. 
In this book, the terms “script” and “program” are used interchangeably, with a slight 


* Fora more complete look at the Python philosophy, type the command import this at any Python interactive 
prompt (you'll see how in Chapter 2). This invokes an “Easter egg” hidden in Python—a collection of design 
principles underlying Python. The acronym EIBTI is now fashionable jargon for the “explicit is better than 
implicit” rule. 
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reference for “script” to describe a simpler top-level file and “program” to refer to a 
p p p p prog 
more sophisticated multifile application. 


Because the term “scripting language” has so many different meanings to different 
observers, some would prefer that it not be applied to Python at all. In fact, people tend 
to make three very different associations, some of which are more useful than others, 
when they hear Python labeled as such: 


Shell tools 
Sometimes when people hear Python described as a scripting language, they think 
it means that Python is a tool for coding operating-system-oriented scripts. Such 
programs are often launched from console command lines and perform tasks such 
as processing text files and launching other programs. 


Python programs can and do serve such roles, but this is just one of dozens of 
common Python application domains. It is not just a better shell-script language. 


Control language 
To others, scripting refers to a “glue” layer used to control and direct (i.e., script) 
other application components. Python programs are indeed often deployed in the 
context of larger applications. For instance, to test hardware devices, Python pro- 
grams may call out to components that give low-level access to a device. Similarly, 
programs may run bits of Python code at strategic points to support end-user 
product customization without the need to ship and recompile the entire system’s 
source code. 


Python’s simplicity makes it a naturally flexible control tool. Technically, though, 
this is also just a common Python role; many (perhaps most) Python programmers 
code standalone scripts without ever using or knowing about any integrated com- 
ponents. It is not just a control language. 


Ease of use 
Probably the best way to think of the term “scripting language” is that it refers to 
a simple language used for quickly coding tasks. This is especially true when the 
term is applied to Python, which allows much faster program development than 
compiled languages like C++. Its rapid development cycle fosters an exploratory, 
incremental mode of programming that has to be experienced to be appreciated. 


Don’t be fooled, though—Python is not just for simple tasks. Rather, it makes tasks 
simple by its ease of use and flexibility. Python has a simple feature set, but it allows 
programs to scale up in sophistication as needed. Because of that, it is commonly 
used for quick tactical tasks and longer-term strategic development. 


So, is Python a scripting language or not? It depends on whom you ask. In general, the 
term “scripting” is probably best used to describe the rapid and flexible mode of de- 
velopment that Python supports, rather than a particular application domain. 
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OK, but What’s the Downside? 


After using it for 17 years and teaching it for 12, the only downside to Python I’ve found 
is that, as currently implemented, its execution speed may not always be as fast as that 
of compiled languages such as C and C++. 


We'll talk about implementation concepts in detail later in this book. In short, the 
standard implementations of Python today compile (i.e., translate) source code state- 
ments to an intermediate format known as byte code and then interpret the byte code. 
Byte code provides portability, as it is a platform-independent format. However, be- 
cause Python is not compiled all the way down to binary machine code (e.g., instruc- 
tions for an Intel chip), some programs will run more slowly in Python than in a fully 
compiled language like C. 


Whether you will ever care about the execution speed difference depends on what kinds 
of programs you write. Python has been optimized numerous times, and Python code 
runs fast enough by itself in most application domains. Furthermore, whenever you do 
something “real” in a Python script, like processing a file or constructing a graphical 
user interface (GUI), your program will actually run at C speed, since such tasks are 
immediately dispatched to compiled C code inside the Python interpreter. More fun- 
damentally, Python’s speed-of-development gain is often far more important than any 
speed-of-execution loss, especially given modern computer speeds. 


Even at today’s CPU speeds, though, there still are some domains that do require op- 
timal execution speeds. Numeric programming and animation, for example, often need 
at least their core number-crunching components to run at C speed (or better). If you 
work in such a domain, you can still use Python—simply split off the parts of the 
application that require optimal speed into compiled extensions, and link those into 
your system for use in Python scripts. 


We won’t talk about extensions much in this text, but this is really just an instance of 
the Python-as-control-language role we discussed earlier. A prime example of this dual 
language strategy is the NumPy numeric programming extension for Python; by com- 
bining compiled and optimized numeric extension libraries with the Python language, 
NumPy turns Python into a numeric programming tool that is efficient and easy to use. 
You may never need to code such extensions in your own Python work, but they provide 
a powerful optimization mechanism if you ever do. 


Who Uses Python Today? 


At this writing, the best estimate anyone can seem to make of the size of the Python 
user base is that there are roughly 1 million Python users around the world today (plus 
or minus a few). This estimate is based on various statistics, like download rates and 
developer surveys. Because Python is open source, a more exact count is difficult— 
there are no license registrations to tally. Moreover, Python is automatically included 
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with Linux distributions, Macintosh computers, and some products and hardware, 
further clouding the user-base picture. 


In general, though, Python enjoys a large user base and a very active developer com- 
munity. Because Python has been around for some 19 years and has been widely used, 
it is also very stable and robust. Besides being employed by individual users, Python is 
also being applied in real revenue-generating products by real companies. For instance: 


e Google makes extensive use of Python in its web search systems, and employs 
Python’s creator. 

e The YouTube video sharing service is largely written in Python. 

e The popular BitTorrent peer-to-peer file sharing system is a Python program. 


e Google’s popular App Engine web development framework uses Python as its ap- 
plication language. 


e EVE Online, a Massively Multiplayer Online Game (MMOG), makes extensive use 
of Python. 


e Maya, a powerful integrated 3D modeling and animation system, provides a 
Python scripting API. 


e Intel, Cisco, Hewlett-Packard, Seagate, Qualcomm, and IBM use Python for hard- 
ware testing. 


e Industrial Light & Magic, Pixar, and others use Python in the production of ani- 
mated movies. 


e JPMorgan Chase, UBS, Getco, and Citadel apply Python for financial market 
forecasting. 


e NASA, Los Alamos, Fermilab, JPL, and others use Python for scientific program- 
ming tasks. 


e iRobot uses Python to develop commercial robotic devices. 


° ESRI uses Python as an end-user customization tool for its popular GIS mapping 
products. 


e The NSA uses Python for cryptography and intelligence analysis. 


° The IronPort email server product uses more than 1 million lines of Python code 
to do its job. 


e The One Laptop Per Child (OLPC) project builds its user interface and activity 
model in Python. 


And so on. Probably the only common thread amongst the companies using Python 
today is that Python is used all over the map, in terms of application domains. Its 
general-purpose nature makes it applicable to almost all fields, not just one. In fact, it’s 
safe to say that virtually every substantial organization writing software is using Python, 
whether for short-term tactical tasks, such as testing and administration, or for long- 
term strategic product development. Python has proven to work well in both modes. 
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For more details on companies using Python today, see Python’s website at http://www 
-python.org. 


What Can | Do with Python? 


In addition to being a well-designed programming language, Python is useful for ac- 
complishing real-world tasks—the sorts of things developers do day in and day out. 
It’s commonly used in a variety of domains, as a tool for scripting other components 
and implementing standalone programs. In fact, as a general-purpose language, 
Python’s roles are virtually unlimited: you can use it for everything from website de- 
velopment and gaming to robotics and spacecraft control. 


However, the most common Python roles currently seem to fall into a few broad cat- 
egories. The next few sections describe some of Python’s most common applications 
today, as well as tools used in each domain. We won't be able to explore the tools 
mentioned here in any depth—if you are interested in any of these topics, see the Python 
website or other resources for more details. 


Systems Programming 


Python’s built-in interfaces to operating-system services make it ideal for writing port- 
able, maintainable system-administration tools and utilities (sometimes called shell 
tools). Python programs can search files and directory trees, launch other programs, do 
parallel processing with processes and threads, and so on. 


Python’s standard library comes with POSIX bindings and support for all the usual OS 
tools: environment variables, files, sockets, pipes, processes, multiple threads, regular 
expression pattern matching, command-line arguments, standard stream interfaces, 
shell-command launchers, filename expansion, and more. In addition, the bulk of Py- 
thon’s system interfaces are designed to be portable; for example, a script that copies 
directory trees typically runs unchanged on all major Python platforms. The Stackless 
Python system, used by EVE Online, also offers advanced solutions to multiprocessing 
requirements. 


GUls 


Python’s simplicity and rapid turnaround also make it a good match for graphical user 
interface programming. Python comes with a standard object-oriented interface to the 
Tk GUI API called tkinter (Tkinter in 2.6) that allows Python programs to implement 
portable GUIs with a native look and feel. Python/tkinter GUIs run unchanged on 
Microsoft Windows, X Windows (on Unix and Linux), and the Mac OS (both Classic 
and OS X). A free extension package, PMW, adds advanced widgets to the tkinter 
toolkit. In addition, the wxPython GUI API, based on a C++ library, offers an alternative 
toolkit for constructing portable GUIs in Python. 
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Higher-level toolkits such as PythonCard and Dabo are built on top of base APIs such 
as wxPython and tkinter. With the proper library, you can also use GUI support in 
other toolkits in Python, such as Qt with PyQt, GTK with PyGTK, MFC with 
PyWin32, .NET with IronPython, and Swing with Jython (the Java version of Python, 
described in Chapter 2) or JPype. For applications that run in web browsers or have 
simple interface requirements, both Jython and Python web frameworks and server- 
side CGI scripts, described in the next section, provide additional user interface 
options. 


Internet Scripting 


Python comes with standard Internet modules that allow Python programs to perform 
a wide variety of networking tasks, in client and server modes. Scripts can communicate 
over sockets; extract form information sent to server-side CGI scripts; transfer files by 
FTP; parse, generate, and analyze XML files; send, receive, compose, and parse email; 
fetch web pages by URLs; parse the HTML and XML of fetched web pages; commu- 
nicate over XML-RPC, SOAP, and Telnet; and more. Python’s libraries make these 
tasks remarkably simple. 


In addition, a large collection of third-party tools are available on the Web for doing 
Internet programming in Python. For instance, the HTMLGen system generates HTML 
files from Python class-based descriptions, the mod_python package runs Python effi- 
ciently within the Apache web server and supports server-side templating with its Py- 
thon Server Pages, and the Jython system provides for seamless Python/Java integration 
and supports coding of server-side applets that run on clients. 


In addition, full-blown web development framework packages for Python, such as 
Django, TurboGears, web2py, Pylons, Zope, and WebWare, support quick construction 
of full-featured and production-quality websites with Python. Many of these include 
features such as object-relational mappers, a Model/View/Controller architecture, 
server-side scripting and templating, and AJAX support, to provide complete and 
enterprise-level web development solutions. 


Component Integration 


We discussed the component integration role earlier when describing Python as a con- 
trol language. Python’s ability to be extended by and embedded in C and C++ systems 
makes it useful as a flexible glue language for scripting the behavior of other systems 
and components. For instance, integrating a C library into Python enables Python to 
test and launch the library’s components, and embedding Python in a product enables 
onsite customizations to be coded without having to recompile the entire product (or 
ship its source code at all). 
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Tools such as the SWIG and SIP code generators can automate much of the work 
needed to link compiled components into Python for use in scripts, and the Cython 
system allows coders to mix Python and C-like code. Larger frameworks, such as Py- 
thon’s COM support on Windows, the Jython Java-based implementation, the Iron- 
Python .NET-based implementation, and various CORBA toolkits for Python, provide 
alternative ways to script components. On Windows, for example, Python scripts can 
use frameworks to script Word and Excel. 


Database Programming 


For traditional database demands, there are Python interfaces to all commonly used 
relational database systems—Sybase, Oracle, Informix, ODBC, MySQL, PostgreSQL, 
SQLite, and more. The Python world has also defined a portable database API for ac- 
cessing SQL database systems from Python scripts, which looks the same on a variety 
of underlying database systems. For instance, because the vendor interfaces implement 
the portable API, a script written to work with the free MySQL system will work largely 
unchanged on other systems (such as Oracle); all you have to do is replace the under- 
lying vendor interface. 


Python’s standard pickle module provides a simple object persistence system—it allows 
programs to easily save and restore entire Python objects to files and file-like objects. 
On the Web, you’ll also find a third-party open source system named ZODB that pro- 
vides a complete object-oriented database system for Python scripts, and others (such 
as SQLObject and SQLAIchemy) that map relational tables onto Python’s class model. 
Furthermore, as of Python 2.5, the in-process SQLite embedded SQL database engine 
is a standard part of Python itself. 


Rapid Prototyping 


To Python programs, components written in Python and C look the same. Because of 
this, it’s possible to prototype systems in Python initially, and then move selected com- 
ponents to a compiled language such as C or C++ for delivery. Unlike some prototyping 
tools, Python doesn’t require a complete rewrite once the prototype has solidified. Parts 
of the system that don’t require the efficiency of a language such as C++ can remain 
coded in Python for ease of maintenance and use. 


Numeric and Scientific Programming 


The NumPy numeric programming extension for Python mentioned earlier includes 
such advanced tools as an array object, interfaces to standard mathematical libraries, 
and much more. By integrating Python with numeric routines coded in a compiled 
language for speed, NumPy turns Python into a sophisticated yet easy-to-use numeric 
programming tool that can often replace existing code written in traditional compiled 
languages such as FORTRAN or C++. Additional numeric tools for Python support 


What Can I Do with Python? | 11 


animation, 3D visualization, parallel processing, and so on. The popular SciPy and 
ScientificPython extensions, for example, provide additional libraries of scientific pro- 
gramming tools and use NumPy code. 


Gaming, Images, Serial Ports, XML, Robots, and More 


Python is commonly applied in more domains than can be mentioned here. For exam- 
ple, you can do: 


* Game programming and multimedia in Python with the pygame system 


e Serial port communication on Windows, Linux, and more with the PySerial 
extension 


e Image processing with PIL, PyOpenGL, Blender, Maya, and others 
e Robot control programming with the PyRo toolkit 


e XML parsing with the xml library package, the xmlrpclib module, and third-party 
extensions 


e Artificial intelligence programming with neural network simulators and expert 
system shells 


e Natural language analysis with the NLTK package 


You can even play solitaire with the PySol program. You'll find support for many such 
fields at the PyPI websites, and via web searches (search Google or http://www.python 
.org for links). 


Many of these specific domains are largely just instances of Python’s component inte- 
gration role in action again. Adding it as a frontend to libraries of components written 
in a compiled language such as C makes Python useful for scripting in a wide variety 
of domains. As a general-purpose language that supports integration, Python is widely 
applicable. 


How Is Python Supported? 


As a popular open source system, Python enjoys a large and active development com- 
munity that responds to issues and develops enhancements with a speed that many 
commercial software developers would find remarkable (if not downright shocking). 
Python developers coordinate work online with a source-control system. Changes fol- 
low a formal PEP (Python Enhancement Proposal) protocol and must be accompanied 
by extensions to Python’s extensive regression testing system. In fact, modifying 
Python today is roughly as involved as changing commercial software—a far cry from 
Python’s early days, when an email to its creator would suffice, but a good thing given 
its current large user base. 
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The PSF (Python Software Foundation), a formal nonprofit group, organizes confer- 
ences and deals with intellectual property issues. Numerous Python conferences are 
held around the world; O’Reilly’s OSCON and the PSF’s PyCon are the largest. The 
former of these addresses multiple open source projects, and the latter is a Python-only 
event that has experienced strong growth in recent years. Attendance at PyCon 2008 
nearly doubled from the prior year, growing from 586 attendees in 2007 to over 1,000 
in 2008. This was on the heels of a 40% attendance increase in 2007, from 410 in 2006. 
PyCon 2009 had 943 attendees, a slight decrease from 2008, but a still very strong 
showing during a global recession. 


What Are Python’s Technical Strengths? 


Naturally, this is a developer’s question. If you don’t already have a programming 
background, the language in the next few sections may be a bit baffling—don’t worry, 
we'll explore all of these terms in more detail as we proceed through this book. For 
developers, though, here is a quick introduction to some of Python’s top technical 
features. 


It’s Object-Oriented 


Python is an object-oriented language, from the ground up. Its class model supports 
advanced notions such as polymorphism, operator overloading, and multiple inheri- 
tance; yet, in the context of Python’s simple syntax and typing, OOP is remarkably easy 
to apply. In fact, if you don’t understand these terms, you’ll find they are much easier 
to learn with Python than with just about any other OOP language available. 


Besides serving as a powerful code structuring and reuse device, Python’s OOP nature 
makes it ideal as a scripting tool for object-oriented systems languages such as C++ 
and Java. For example, with the appropriate glue code, Python programs can subclass 
(specialize) classes implemented in C++, Java, and C#. 


Of equal significance, OOP is an option in Python; you can go far without having to 
become an object guru all at once. Much like C++, Python supports both procedural 
and object-oriented programming modes. Its object-oriented tools can be applied if 
and when constraints allow. This is especially useful in tactical development modes, 
which preclude design phases. 


It's Free 


Python is completely free to use and distribute. As with other open source software, 
such as Tcl, Perl, Linux, and Apache, you can fetch the entire Python system’s source 
code for free on the Internet. There are no restrictions on copying it, embedding it in 
your systems, or shipping it with your products. In fact, you can even sell Python’s 
source code, if you are so inclined. 
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But don’t get the wrong idea: “free” doesn’t mean “unsupported.” On the contrary, 
the Python online community responds to user queries with a speed that most com- 
mercial software help desks would do well to try to emulate. Moreover, because Python 
comes with complete source code, it empowers developers, leading to the creation of 
a large team of implementation experts. Although studying or changing a programming 
language’s implementation isn’t everyone’s idea of fun, it’s comforting to know that 
you can do so if you need to. You’re not dependent on the whims of a commercial 
vendor; the ultimate documentation source is at your disposal. 


As mentioned earlier, Python development is performed by a community that largely 
coordinates its efforts over the Internet. It consists of Python’s creator—Guido van 
Rossum, the officially anointed Benevolent Dictator for Life (BDFL) of Python—plus a 
supporting cast of thousands. Language changes must follow a formal enhancement 
procedure and be scrutinized by both other developers and the BDFL. Happily, this 
tends to make Python more conservative with changes than some other languages. 


It’s Portable 


The standard implementation of Python is written in portable ANSI C, and it compiles 
and runs on virtually every major platform currently in use. For example, Python pro- 
grams run today on everything from PDAs to supercomputers. As a partial list, Python 
is available on: 

e Linux and Unix systems 

e Microsoft Windows and DOS (all modern flavors) 

e Mac OS (both OS X and Classic) 

* BeOS, OS/2, VMS, and QNX 

e Real-time systems such as VxWorks 

e Cray supercomputers and IBM mainframes 

e PDAs running Palm OS, PocketPC, and Linux 

e Cell phones running Symbian OS and Windows Mobile 

e Gaming consoles and iPods 

e And more 
Like the language interpreter itself, the standard library modules that ship with Python 
are implemented to be as portable across platform boundaries as possible. Further, 
Python programs are automatically compiled to portable byte code, which runs the 


same on any platform with a compatible version of Python installed (more on this in 
the next chapter). 
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What that means is that Python programs using the core language and standard libraries 
run the same on Linux, Windows, and most other systems with a Python interpreter. 
Most Python ports also contain platform-specific extensions (e.g., COM support on 
Windows), but the core Python language and libraries work the same everywhere. As 
mentioned earlier, Python also includes an interface to the Tk GUI toolkit called tkinter 
(Tkinter in 2.6), which allows Python programs to implement full-featured graphical 
user interfaces that run on all major GUI platforms without program changes. 


It’s Powerful 


From a features perspective, Python is something of a hybrid. Its toolset places it be- 
tween traditional scripting languages (such as Tcl, Scheme, and Perl) and systems de- 
velopment languages (such as C, C++, and Java). Python provides all the simplicity 
and ease of use of a scripting language, along with more advanced software-engineering 
tools typically found in compiled languages. Unlike some scripting languages, this 
combination makes Python useful for large-scale development projects. As a preview, 
here are some of the main things you'll find in Python’s toolbox: 


Dynamic typing 
Python keeps track of the kinds of objects your program uses when it runs; it 
doesn’t require complicated type and size declarations in your code. In fact, as 
you'll see in Chapter 6, there is no such thing as a type or variable declaration 
anywhere in Python. Because Python code does not constrain data types, it is also 
usually automatically applicable to a whole range of objects. 


Automatic memory management 
Python automatically allocates objects and reclaims (“garbage collects”) them 
when they are no longer used, and most can grow and shrink on demand. As you'll 
learn, Python keeps track of low-level memory details so you don’t have to. 

Programming-in-the-large support 
For building larger systems, Python includes tools such as modules, classes, and 
exceptions. These tools allow you to organize systems into components, use OOP 
to reuse and customize code, and handle events and errors gracefully. 

Built-in object types 
Python provides commonly used data structures such as lists, dictionaries, and 
strings as intrinsic parts of the language; as you'll see, they’re both flexible and easy 
to use. For instance, built-in objects can grow and shrink on demand, can be 
arbitrarily nested to represent complex information, and more. 


Built-in tools 
To process all those object types, Python comes with powerful and standard op- 
erations, including concatenation (joining collections), slicing (extracting sec- 
tions), sorting, mapping, and more. 
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Library utilities 
For more specific tasks, Python also comes with a large collection of precoded 
library tools that support everything from regular expression matching to net- 
working. Once you learn the language itself, Python’s library tools are where much 
of the application-level action occurs. 

Third-party utilities 
Because Python is open source, developers are encouraged to contribute precoded 
tools that support tasks beyond those supported by its built-ins; on the Web, you'll 
find free support for COM, imaging, CORBA ORBs, XML, database access, and 
much more. 


Despite the array of tools in Python, it retains a remarkably simple syntax and design. 
The result is a powerful programming tool with all the usability of a scripting language. 


It’s Mixable 


Python programs can easily be “glued” to components written in other languages in a 
variety of ways. For example, Python’s C API lets C programs call and be called by 
Python programs flexibly. That means you can add functionality to the Python system 
as needed, and use Python programs within other environments or systems. 


Mixing Python with libraries coded in languages such as C or C++, for instance, makes 
it an easy-to-use frontend language and customization tool. As mentioned earlier, this 
also makes Python good at rapid prototyping; systems may be implemented in Python 
first, to leverage its speed of development, and later moved to C for delivery, one piece 
at a time, according to performance demands. 


It’s Easy to Use 


To run a Python program, you simply type it and run it. There are no intermediate 
compile and link steps, like there are for languages such as C or C++. Python executes 
programs immediately, which makes for an interactive programming experience and 
rapid turnaround after program changes—in many cases, you can witness the effect of 
a program change as fast as you can type it. 


Of course, development cycle turnaround is only one aspect of Python’s ease of use. It 
also provides a deliberately simple syntax and powerful built-in tools. In fact, some 
have gone so faras to call Python “executable pseudocode.” Because it eliminates much 
of the complexity in other tools, Python programs are simpler, smaller, and more flex- 
ible than equivalent programs in languages like C, C++, and Java. 
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It’s Easy to Learn 


This brings us to a key point of this book: compared to other programming languages, 
the core Python language is remarkably easy to learn. In fact, you can expect to be 
coding significant Python programs in a matter of days (or perhaps in just hours, if 
youre already an experienced programmer). That’s good news for professional devel- 
opers seeking to learn the language to use on the job, as well as for end users of systems 
that expose a Python layer for customization or control. 


Today, many systems rely on the fact that end users can quickly learn enough Python 
to tailor their Python customizations’ code onsite, with little or no support. Although 
Python does have advanced programming tools, its core language will still seem simple 
to beginners and gurus alike. 


It’s Named After Monty Python 


OK, this isn’t quite a technical strength, but it does seem to be a surprisingly well-kept 
secret that I wish to expose up front. Despite all the reptile icons in the Python world, 
the truth is that Python creator Guido van Rossum named it after the BBC comedy 
series Monty Python’s Flying Circus. He is a big fan of Monty Python, as are many 
software developers (indeed, there seems to almost be a symmetry between the two 


fields). 


This legacy inevitably adds a humorous quality to Python code examples. For instance, 
the traditional “foo” and “bar” for generic variable names become “spam” and “eggs” 
in the Python world. The occasional “Brian,” “ni,” and “shrubbery” likewise owe their 
appearances to this namesake. It even impacts the Python community at large: talks at 
Python conferences are regularly billed as “The Spanish Inquisition.” 


All of this is, of course, very funny if you are familiar with the show, but less so other- 
wise. You don’t need to be familiar with the series to make sense of examples that 
borrow references to Monty Python (including many you will see in this book), but at 
least you now know their root. 


How Does Python Stack Up to Language X? 


Finally, to place it in the context of what you may already know, people sometimes 
compare Python to languages suchas Perl, Tcl, and Java. We talked about performance 
earlier, so here we’ll focus on functionality. While other languages are also useful tools 
to know and use, many people find that Python: 
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e Is more powerful than Tcl. Python’s support for “programming in the large” makes 
it applicable to the development of larger systems. 


e Has a cleaner syntax and simpler design than Perl, which makes it more readable 
and maintainable and helps reduce program bugs. 


e Is simpler and easier to use than Java. Python is a scripting language, but Java 
inherits much of the complexity and syntax of systems languages such as C++. 


e Is simpler and easier to use than C++, but it doesn’t often compete with C++; as 
a scripting language, Python typically serves different roles. 


¢ Is both more powerful and more cross-platform than Visual Basic. Its open source 
nature also means it is not controlled by a single company. 


e Is more readable and general-purpose than PHP. Python is sometimes used to 
construct websites, but it’s also widely used in nearly every other computer do- 
main, from robotics to movie animation. 


e Is more mature and has a more readable syntax than Ruby. Unlike Ruby and Java, 
OOP is an option in Python—Python does not impose OOP on users or projects 
to which it may not apply. 


e Has the dynamic flavor of languages like SmallTalk and Lisp, but also has a simple, 
traditional syntax accessible to developers as well as end users of customizable 
systems. 


Especially for programs that do more than scan text files, and that might have to be 
read in the future by others (or by you!), many people find that Python fits the bill better 
than any other scripting or programming language available today. Furthermore, unless 
your application requires peak performance, Python is often a viable alternative to 
systems development languages such as C, C++, and Java: Python code will be much 
less difficult to write, debug, and maintain. 


Of course, your author has been a card-carrying Python evangelist since 1992, so take 
these comments as you may. They do, however, reflect the common experience of many 
developers who have taken time to explore what Python has to offer. 


Chapter Summary 


And that concludes the hype portion of this book. In this chapter, we’ve explored some 
of the reasons that people pick Python for their programming tasks. We’ve also seen 
how it is applied and looked at a representative sample of who is using it today. My 
goal is to teach Python, though, not to sell it. The best way to judge a language is to 
see it in action, so the rest of this book focuses entirely on the language details we’ve 
glossed over here. 


The next two chapters begin our technical introduction to the language. In them, we’ll 
explore ways to run Python programs, peek at Python’s byte code execution model, 
and introduce the basics of module files for saving code. The goal will be to give you 
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just enough information to run the examples and exercises in the rest of the book. You 
won't really start programming per se until Chapter 4, but make sure you have a handle 
on the startup details before moving on. 


Test Your Knowledge: Quiz 


In this edition of the book, we will be closing each chapter with a quick pop quiz about 
the material presented therein to help you review the key concepts. The answers for 
these quizzes appear immediately after the questions, and you are encouraged to read 
the answers once you've taken a crack at the questions yourself. In addition to these 
end-of-chapter quizzes, you'll find lab exercises at the end of each part of the book, 
designed to help you start coding Python on your own. For now, here’s your first test. 
Good luck! 

1. What are the six main reasons that people choose to use Python? 
. Name four notable companies or organizations using Python today. 
. Why might you not want to use Python in an application? 
. What can you do with Python? 
. What’s the significance of the Python import this statement? 


. Why does “spam” show up in so many Python examples in books and on the Web? 
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. What is your favorite color? 


Test Your Knowledge: Answers 


How did you do? Here are the answers I came up with, though there may be multiple 
solutions to some quiz questions. Again, even if you’re sure you got a question right, I 
encourage you to look at these answers for additional context. See the chapter’s text 
for more details if any of these responses don’t make sense to you. 


1. Software quality, developer productivity, program portability, support libraries, 
component integration, and simple enjoyment. Of these, the quality and produc- 
tivity themes seem to be the main reasons that people choose to use Python. 


2. Google, Industrial Light & Magic, EVE Online, Jet Propulsion Labs, Maya, ESRI, 
and many more. Almost every organization doing software development uses Py- 
thon in some fashion, whether for long-term strategic product development or for 
short-term tactical tasks such as testing and system administration. 


3. Python’s downside is performance: it won’t run as quickly as fully compiled 
languages like C and C++. On the other hand, it’s quick enough for most appli- 
cations, and typical Python code runs at close to C speed anyhow because it invokes 
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linked-in C code in the interpreter. If speed is critical, compiled extensions are 
available for number-crunching parts of an application. 

4. You can use Python for nearly anything you can do with a computer, from website 
development and gaming to robotics and spacecraft control. 

5. import this triggers an Easter egg inside Python that displays some of the design 
philosophies underlying the language. You’ll learn how to run this statement in 
the next chapter. 

6. “Spam” is a reference from a famous Monty Python skit in which people trying to 
order food in a cafeteria are drowned out by a chorus of Vikings singing about 
spam. Oh, and it’s also a common variable name in Python scripts... 


7. Blue. No, yellow! 


Python Is Engineering, Not Art 


When Python first emerged on the software scene in the early 1990s, it spawned what 
is now something of a classic conflict between its proponents and those of another 
popular scripting language, Perl. Personally, I think the debate is tired and unwarranted 
today—developers are smart enough to draw their own conclusions. Still, this is one 
of the most common topics I’m asked about on the training road, so it seems fitting to 
say a few words about it here. 


The short story is this: you can do everything in Python that you can in Perl, but you can 
read your code after you do it. That’s it—their domains largely overlap, but Python is 
more focused on producing readable code. For many, the enhanced readability of Py- 
thon translates to better code reusability and maintainability, making Python a better 
choice for programs that will not be written once and thrown away. Perl code is easy 
to write, but difficult to read. Given that most software has a lifespan much longer than 
its initial creation, many see Python as a more effective tool. 


The somewhat longer story reflects the backgrounds of the designers of the two lan- 
guages and underscores some of the main reasons people choose to use Python. Py- 
thon’s creator is a mathematician by training; as such, he produced a language with a 
high degree of uniformity—its syntax and toolset are remarkably coherent. Moreover, 
like math, Python’s design is orthogonal—most of the language follows from a small 
set of core concepts. For instance, once one grasps Python’s flavor of polymorphism, 
the rest is largely just details. 


By contrast, the creator of the Perl language is a linguist, and its design reflects this 
heritage. There are many ways to accomplish the same tasks in Perl, and language 
constructs interact in context-sensitive and sometimes quite subtle ways—much like 
natural language. As the well-known Perl motto states, “There’s more than one way to 
do it.” Given this design, both the Perl language and its user community have histori- 
cally encouraged freedom of expression when writing code. One person’s Perl code can 
be radically different from another’s. In fact, writing unique, tricky code is often a 
source of pride among Perl users. 
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But as anyone who has done any substantial code maintenance should be able to attest, 
freedom of expression is great for art, but lousy for engineering. In engineering, we need 
a minimal feature set and predictability. In engineering, freedom of expression can lead 
to maintenance nightmares. As more than one Perl user has confided to me, the result 
of too much freedom is often code that is much easier to rewrite from scratch than to 
modify. 


Consider this: when people create a painting or a sculpture, they do so for themselves 
for purely aesthetic purposes. The possibility of someone else having to change that 
painting or sculpture later does not enter into it. This is a critical difference between 
art and engineering. When people write software, they are not writing it for themselves. 
In fact, they are not even writing primarily for the computer. Rather, good programmers 
know that code is written for the next human being who has to read it in order to 
maintain or reuse it. If that person cannot understand the code, it’s all but useless in a 
realistic development scenario. 


This is where many people find that Python most clearly differentiates itself from 
scripting languages like Perl. Because Python’s syntax model almost forces users to 
write readable code, Python programs lend themselves more directly to the full software 
development cycle. And because Python emphasizes ideas such as limited interactions, 
code uniformity and regularity, and feature consistency, it more directly fosters code 
that can be used long after it is first written. 


In the long run, Python’s focus on code quality in itself boosts programmer produc- 
tivity, as well as programmer satisfaction. Python programmers can be creative, too, of 
course, and as we’ll see, the language does offer multiple solutions for some tasks. At 
its core, though, Python encourages good engineering in ways that other scripting lan- 
guages often do not. 


At least, that’s the common consensus among many people who have adopted Python. 
You should always judge such claims for yourself, of course, by learning what Python 
has to offer. To help you get started, let’s move on to the next chapter. 
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CHAPTER 2 
How Python Runs Programs 


This chapter and the next take a quick look at program execution—how you launch 
code, and how Python runs it. In this chapter, we’ll study the Python interpreter. 
Chapter 3 will then show you how to get your own programs up and running. 


Startup details are inherently platform-specific, and some of the material in these two 
chapters may not apply to the platform you work on, so you should feel free to skip 
parts not relevant to your intended use. Likewise, more advanced readers who have 
used similar tools in the past and prefer to get to the meat of the language quickly may 
want to file some of this chapter away as “for future reference.” For the rest of you, let’s 
learn how to run some code. 


Introducing the Python Interpreter 


So far, I’ve mostly been talking about Python as a programming language. But, as cur- 
rently implemented, it’s also a software package called an interpreter. An interpreter is 
a kind of program that executes other programs. When you write a Python program, 
the Python interpreter reads your program and carries out the instructions it contains. 
In effect, the interpreter is a layer of software logic between your code and the computer 
hardware on your machine. 


When the Python package is installed on your machine, it generates a number of com- 
ponents—minimally, an interpreter and a support library. Depending on how you use 
it, the Python interpreter may take the form of an executable program, or a set of 
libraries linked into another program. Depending on which flavor of Python you run, 
the interpreter itself may be implemented as a C program, a set of Java classes, or 
something else. Whatever form it takes, the Python code you write must always be run 
by this interpreter. And to enable that, you must install a Python interpreter on your 
computer. 


Python installation details vary by platform and are covered in more depth in Appen- 
dix A. In short: 
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e Windows users fetch and run a self-installing executable file that puts Python on 
their machines. Simply double-click and say Yes or Next at all prompts. 


e Linux and Mac OS X users probably already have a usable Python preinstalled on 
their computers—it’s a standard component on these platforms today. 


e Some Linux and Mac OS X users (and most Unix users) compile Python from its 
full source code distribution package. 


e Linux users can also find RPM files, and Mac OS X users can find various Mac- 
specific installation packages. 


e Other platforms have installation techniques relevant to those platforms. For 
instance, Python is available on cell phones, game consoles, and iPods, but instal- 
lation details vary widely. 


Python itself may be fetched from the downloads page on the website, http://www 
.python.org. It may also be found through various other distribution channels. Keep in 
mind that you should always check to see whether Python is already present before 
installing it. If you’re working on Windows, you'll usually find Python in the Start 
menu, as captured in Figure 2-1 (these menu options are discussed in the next chapter). 
On Unix and Linux, Python probably lives in your /usr directory tree. 


Because installation details are so platform-specific, we’ll finesse the rest of this story 
here. For more details on the installation process, consult Appendix A. For the purposes 
of this chapter and the next, I’ll assume that you’ve got Python ready to go. 


Program Execution 


What it means to write and run a Python script depends on whether you look at these 
tasks as a programmer, or as a Python interpreter. Both views offer important perspec- 
tives on Python programming. 


The Programmer's View 


In its simplest form, a Python program is just a text file containing Python statements. 
For example, the following file, named script0.py, is one of the simplest Python scripts 
I could dream up, but it passes for a fully functional Python program: 

print('hello world’) 

print(2 ** 100) 
This file contains two Python print statements, which simply print a string (the text in 
quotes) and a numeric expression result (2 to the power 100) to the output stream. 
Don’t worry about the syntax of this code yet—for this chapter, we’re interested only 
in getting it to run. I’ll explain the print statement, and why you can raise 2 to the 
power 100 in Python without overflowing, in the next parts of this book. 
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Figure 2-1. When installed on Windows, this is how Python shows up in your Start button menu. This 
can vary a bit from release to release, but IDLE starts a development GUI, and Python starts a simple 
interactive session. Also here are the standard manuals and the PyDoc documentation engine (Module 
Docs). 


You can create such a file of statements with any text editor you like. By convention, 
Python program files are given names that end in .py; technically, this naming scheme 
is required only for files that are “imported,” as shown later in this book, but most 
Python files have .py names for consistency. 


After you’ve typed these statements into a text file, you must tell Python to execute the 
file—which simply means to run all the statements in the file from top to bottom, one 
after another. As you’ll see in the next chapter, you can launch Python program files 
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by shell command lines, by clicking their icons, from within IDEs, and with other 
standard techniques. If all goes well, when you execute the file, you'll see the results of 
the two print statements show up somewhere on your computer—by default, usually 
in the same window you were in when you ran the program: 


hello world 
1267650600228229401496703205376 


For example, here’s what happened when I ran this script from a DOS command line 
on a Windows laptop (typically called a Command Prompt window, found in the Ac- 
cessories program menu), to make sure it didn’t have any silly typos: 

C:\temp> python scripto.py 


hello world 
1267650600228229401496703205376 


We've just run a Python script that prints a string and a number. We probably won’t 
win any programming awards with this code, but it’s enough to capture the basics of 
program execution. 


Python's View 


The brief description in the prior section is fairly standard for scripting languages, and 
it’s usually all that most Python programmers need to know. You type code into text 
files, and you run those files through the interpreter. Under the hood, though, a bit 
more happens when you tell Python to “go.” Although knowledge of Python internals 
is not strictly required for Python programming, a basic understanding of the runtime 
structure of Python can help you grasp the bigger picture of program execution. 


When you instruct Python to run your script, there are a few steps that Python carries 
out before your code actually starts crunching away. Specifically, it’s first compiled to 
something called “byte code” and then routed to something called a “virtual machine.” 


Byte code compilation 


Internally, and almost completely hidden from you, when you execute a program 
Python first compiles your source code (the statements in your file) into a format known 
as byte code. Compilation is simply a translation step, and byte code is a lower-level, 
platform-independent representation of your source code. Roughly, Python translates 
each of your source statements into a group of byte code instructions by decomposing 
them into individual steps. This byte code translation is performed to speed 
execution—byte code can be run much more quickly than the original source code 
statements in your text file. 


You'll notice that the prior paragraph said that this is almost completely hidden from 
you. If the Python process has write access on your machine, it will store the byte code 
of your programs in files that end with a .pyc extension (“.pyc” means compiled “.py” 
source). You will see these files show up on your computer after you’ve run a few 
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programs alongside the corresponding source code files (that is, in the same 
directories). 


Python saves byte code like this as a startup speed optimization. The next time you run 
your program, Python will load the .pyc files and skip the compilation step, as long as 
you haven’t changed your source code since the byte code was last saved. Python au- 
tomatically checks the timestamps of source and byte code files to know when it must 
recompile—if you resave your source code, byte code is automatically re-created the 
next time your program is run. 


If Python cannot write the byte code files to your machine, your program still works— 
the byte code is generated in memory and simply discarded on program exit.’ However, 
because .pyc files speed startup time, you'll want to make sure they are written for larger 
programs. Byte code files are also one way to ship Python programs—Python is happy 
to run a program if all it can find are .pyc files, even if the original .py source files are 
absent. (See “Frozen Binaries” on page 32 for another shipping option.) 


The Python Virtual Machine (PVM) 


Once your program has been compiled to byte code (or the byte code has been loaded 
from existing .pyc files), it is shipped off for execution to something generally known 
as the Python Virtual Machine (PVM, for the more acronym-inclined among you). The 
PVM sounds more impressive than it is; really, it’s not a separate program, and it need 
not be installed by itself. In fact, the PVM is just a big loop that iterates through your 
byte code instructions, one by one, to carry out their operations. The PVM is the run- 
time engine of Python; it’s always present as part of the Python system, and it’s the 
component that truly runs your scripts. Technically, it’s just the last step of what is 
called the “Python interpreter.” 


Figure 2-2 illustrates the runtime structure described here. Keep in mind that all of this 
complexity is deliberately hidden from Python programmers. Byte code compilation is 
automatic, and the PVM is just part of the Python system that you have installed on 
your machine. Again, programmers simply code and run files of statements. 


Performance implications 


Readers with a background in fully compiled languages such as Cand C++ might notice 
a few differences in the Python model. For one thing, there is usually no build or “make” 
step in Python work: code runs immediately after it is written. For another, Python byte 
code is not binary machine code (e.g., instructions for an Intel chip). Byte code is a 
Python-specific representation. 


* And, strictly speaking, byte code is saved only for files that are imported, not for the top-level file of a program. 
We'll explore imports in Chapter 3, and again in Part V. Byte code is also never saved for code typed at the 
interactive prompt, which is described in Chapter 3. 


Program Execution | 27 


Source Byte code Runtime 
> "m, ` 
=> my >: | mpye | :——>| Pm 


Figure 2-2. Python’s traditional runtime execution model: source code you type is translated to byte 
code, which is then run by the Python Virtual Machine. Your code is automatically compiled, but then 
it is interpreted. 


This is why some Python code may not run as fast as C or C++ code, as described in 
Chapter 1—the PVM loop, not the CPU chip, still must interpret the byte code, and 
byte code instructions require more work than CPU instructions. On the other hand, 
unlike in classic interpreters, there is still an internal compile step—Python does not 
need to reanalyze and reparse each source statement repeatedly. The net effect is that 
pure Python code runs at speeds somewhere between those of a traditional compiled 
language and a traditional interpreted language. See Chapter 1 for more on Python 
performance tradeoffs. 


Development implications 


Another ramification of Python’s execution model is that there is really no distinction 
between the development and execution environments. That is, the systems that com- 
pile and execute your source code are really one and the same. This similarity may have 
a bit more significance to readers with a background in traditional compiled languages, 
but in Python, the compiler is always present at runtime and is part of the system that 
runs programs. 


This makes for a much more rapid development cycle. There is no need to precompile 
and link before execution may begin; simply type and run the code. This also adds a 
much more dynamic flavor to the language—it is possible, and often very convenient, 
for Python programs to construct and execute other Python programs at runtime. The 
eval and exec built-ins, for instance, accept and run strings containing Python program 
code. This structure is also why Python lends itself to product customization—because 
Python code can be changed on the fly, users can modify the Python parts of a system 
onsite without needing to have or compile the entire system’s code. 


Ata more fundamental level, keep in mind that all we really have in Python is runtime— 
there is no initial compile-time phase at all, and everything happens as the program is 
running. This even includes operations such as the creation of functions and classes 
and the linkage of modules. Such events occur before execution in more static lan- 
guages, but happen as programs execute in Python. As we'll see, the net effect makes 
for a much more dynamic programming experience than that to which some readers 
may be accustomed. 
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Execution Model Variations 


Before moving on, I should point out that the internal execution flow described in the 
prior section reflects the standard implementation of Python today but is not really a 
requirement of the Python language itself. Because of that, the execution model is prone 
to changing with time. In fact, there are already a few systems that modify the picture 
in Figure 2-2 somewhat. Let’s take a few moments to explore the most prominent of 
these variations. 


Python Implementation Alternatives 


Really, as this book is being written, there are three primary implementations of the 
Python language—CPython, Jython, and IronPython—along with a handful of secon- 
dary implementations such as Stackless Python. In brief, CPython is the standard im- 
plementation; all the others have very specific purposes and roles. All implement the 
same Python language but execute programs in different ways. 


(Python 


The original, and standard, implementation of Python is usually called CPython, when 
you want to contrast it with the other two. Its name comes from the fact that it is coded 
in portable ANSI C language code. This is the Python that you fetch from http://www 
.python.org, get with the ActivePython distribution, and have automatically on most 
Linux and Mac OS X machines. If you’ve found a preinstalled version of Python on 
your machine, it’s probably CPython, unless your company is using Python in very 
specialized ways. 


Unless you want to script Java or .NET applications with Python, you probably want 
to use the standard CPython system. Because it is the reference implementation of the 
language, it tends to run the fastest, be the most complete, and be more robust than 
the alternative systems. Figure 2-2 reflects CPython’s runtime architecture. 


Jython 


The Jython system (originally known as JPython) is an alternative implementation of 
the Python language, targeted for integration with the Java programming language. 
Jython consists of Java classes that compile Python source code to Java byte code and 
then route the resulting byte code to the Java Virtual Machine (JVM). Programmers 
still code Python statements in .py text files as usual; the Jython system essentially just 
replaces the rightmost two bubbles in Figure 2-2 with Java-based equivalents. 


Jython’s goal is to allow Python code to script Java applications, much as CPython 
allows Python to script C and C++ components. Its integration with Java is remarkably 
seamless. Because Python code is translated to Java byte code, it looks and feels like a 
true Java program at runtime. Jython scripts can serve as web applets and servlets, build 
Java-based GUIs, and so on. Moreover, Jython includes integration support that allows 
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Python code to import and use Java classes as though they were coded in Python. 
Because Jython is slower and less robust than CPython, though, it is usually seen as a 
tool of interest primarily to Java developers looking for a scripting language to be a 
frontend to Java code. 


IronPython 


A third implementation of Python, and newer than both CPython and Jython, 
IronPython is designed to allow Python programs to integrate with applications coded 
to work with Microsoft’s .NET Framework for Windows, as well as the Mono open 
source equivalent for Linux. .NET and its C# programming language runtime system 
are designed to be a language-neutral object communication layer, in the spirit of Mi- 
crosoft’s earlier COM model. IronPython allows Python programs to act as both client 
and server components, accessible from other .NET languages. 


By implementation, IronPython is very much like Jython (and, in fact, was developed 
by the same creator)—it replaces the last two bubbles in Figure 2-2 with equivalents 
for execution in the .NET environment. Also, like Jython, IronPython has a special 
focus—it is primarily of interest to developers integrating Python with .NET compo- 
nents. Because it is being developed by Microsoft, though, IronPython might also be 
able to leverage some important optimization tools for better performance. 
IronPython’s scope is still evolving as I write this; for more details, consult the Python 
online resources or search the Web.? 


Execution Optimization Tools 


CPython, Jython, and IronPython all implement the Python language in similar ways: 
by compiling source code to byte code and executing the byte code on an appropriate 
virtual machine. Still other systems, including the Psyco just-in-time compiler and the 
Shedskin C++ translator, instead attempt to optimize the basic execution model. These 
systems are not required knowledge at this point in your Python career, but a quick 
look at their place in the execution model might help demystify the model in general. 


The Psyco just-in-time compiler 


The Psyco system is not another Python implementation, but rather a component that 
extends the byte code execution model to make programs run faster. In terms of 
Figure 2-2, Psyco isan enhancement to the PVM that collects and uses type information 
while the program runs to translate portions of the program’s byte code all the way 
down to real binary machine code for faster execution. Psyco accomplishes this 


t Jython and IronPython are completely independent implementations of Python that compile Python source 
for different runtime architectures. It is also possible to access Java and .NET software from standard CPython 
programs: JPype and Python for .NET systems, for example, allow CPython code to call out to Java and .NET 
components. 
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translation without requiring changes to the code or a separate compilation step during 
development. 


Roughly, while your program runs, Psyco collects information about the kinds of ob- 
jects being passed around; that information can be used to generate highly efficient 
machine code tailored for those object types. Once generated, the machine code then 
replaces the corresponding part of the original byte code to speed your program’s over- 
all execution. The net effect is that, with Psyco, your program becomes much quicker 
over time and as it is running. In ideal cases, some Python code may become as fast as 
compiled C code under Psyco. 


Because this translation from byte code happens at program runtime, Psyco is generally 
known as a just-in-time (JIT) compiler. Psyco is actually a bit different from the JIT 
compilers some readers may have seen for the Java language, though. Really, Psyco is 
a specializing JIT compiler—it generates machine code tailored to the data types that 
your program actually uses. For example, if a part of your program uses different data 
types at different times, Psyco may generate a different version of machine code to 
support each different type combination. 


Psyco has been shown to speed Python code dramatically. According to its web page, 
Psyco provides “2x to 100x speed-ups, typically 4x, with an unmodified Python inter- 
preter and unmodified source code, just a dynamically loadable C extension module.” 
Of equal significance, the largest speedups are realized for algorithmic code written in 
pure Python—exactly the sort of code you might normally migrate to C to optimize. 
With Psyco, such migrations become even less important. 


Psyco is not yet a standard part of Python; you will have to fetch and install it separately. 
It is also still something of a research project, so you'll have to track its evolution online. 
In fact, at this writing, although Psyco can still be fetched and installed by itself, it 
appears that much of the system may eventually be absorbed into the newer “PyPy” 
project—an attempt to reimplement Python’s PVM in Python code, to better support 
optimizations like Psyco. 


Perhaps the largest downside of Psyco is that it currently only generates machine code 
for Intel x86 architecture chips, though this includes Windows and Linux boxes and 
recent Macs. For more details on the Psyco extension, and other JIT efforts that may 
arise, consult http:/hvww.python.org; you can also check out Psyco’s home page, which 
currently resides at http://psyco.sourceforge.net. 


The Shedskin C++ translator 


Shedskin is an emerging system that takes a different approach to Python program 
execution—it attempts to translate Python source code to C++ code, which your com- 
puter’s C++ compiler then compiles to machine code. As such, it represents a platform- 
neutral approach to running Python code. Shedskin is still somewhat experimental as 
I write these words, and it limits Python programs to an implicit statically typed con- 
straint that is technically not normal Python, so we won’t go into further detail here. 


Execution Model Variations | 31 


Initial results, though, show that it has the potential to outperform both standard Py- 
thon and the Psyco extension in terms of execution speed, and it is a promising project. 
Search the Web for details on the project’s current status. 


Frozen Binaries 


Sometimes when people ask for a “real” Python compiler, what they’re really seeking 
is simply a way to generate standalone binary executables from their Python programs. 
This is more a packaging and shipping idea than an execution-flow concept, but it’s 
somewhat related. With the help of third-party tools that you can fetch off the Web, it 
is possible to turn your Python programs into true executables, known as frozen bi- 
naries in the Python world. 


Frozen binaries bundle together the byte code of your program files, along with the 
PVM (interpreter) and any Python support files your program needs, into a single 
package. There are some variations on this theme, but the end result can be a single 
binary executable program (e.g., an .exe file on Windows) that can easily be shipped 
to customers. In Figure 2-2, it is as though the byte code and PVM are merged into a 
single component—a frozen binary file. 


Today, three primary systems are capable of generating frozen binaries: py2exe (for 
Windows), PyInstaller (which is similar to py2exe but also works on Linux and Unix 
and is capable of generating self-installing binaries), and freeze (the original). You may 
have to fetch these tools separately from Python itself, but they are available free of 
charge. They are also constantly evolving, so consult http://vww.python.org or your 
favorite web search engine for more on these tools. To give you an idea of the scope of 
these systems, py2exe can freeze standalone programs that use the tkinter, PMW, 
wxPython, and PyGTK GUI libraries; programs that use the pygame game program- 
ming toolkit; win32com client programs; and more. 


Frozen binaries are not the same as the output of a true compiler—they run byte code 
through a virtual machine. Hence, apart from a possible startup improvement, frozen 
binaries run at the same speed as the original source files. Frozen binaries are not small 
(they contain a PVM), but by current standards they are not unusually large either. 
Because Python is embedded in the frozen binary, though, it does not have to be in- 
stalled on the receiving end to run your program. Moreover, because your code is em- 
bedded in the frozen binary, it is more effectively hidden from recipients. 


This single file-packaging scheme is especially appealing to developers of commercial 
software. For instance, a Python-coded user interface program based on the tkinter 
toolkit can be frozen into an executable file and shipped as a self-contained program 
on a CD or on the Web. End users do not need to install (or even have to know about) 
Python to run the shipped program. 
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Other Execution Options 
Still other schemes for running Python programs have more focused goals: 


e The Stackless Python system is a standard CPython implementation variant that 
does not save state on the C language call stack. This makes Python more easy to 
port to small stack architectures, provides efficient multiprocessing options, and 
fosters novel programming structures such as coroutines. 


e The Cython system (based on work done by the Pyrex project) is a hybrid language 
that combines Python code with the ability to call C functions and use C type 
declarations for variables, parameters, and class attributes. Cython code can be 
compiled to C code that uses the Python/C API, which may then be compiled 
completely. Though not completely compatible with standard Python, Cython can 
be useful both for wrapping external C libraries and for coding efficient C exten- 
sions for Python. 


For more details on these systems, search the Web for recent links. 


Future Possibilities? 


Finally, note that the runtime execution model sketched here is really an artifact of the 
current implementation of Python, not of the language itself. For instance, it’s not 
impossible that a full, traditional compiler for translating Python source code to ma- 
chine code may appear during the shelf life of this book (although one has not in nearly 
two decades!). New byte code formats and implementation variants may also be adop- 
ted in the future. For instance: 


e The Parrot project aims to provide a common byte code format, virtual machine, 
and optimization techniques for a variety of programming languages (see http:// 
www.python.org). Python’s own PVM runs Python code more efficiently than Par- 
rot, but it’s unclear how Parrot will evolve. 


e The PyPy project is an attempt to reimplement the PVM in Python itself to enable 
new implementation techniques. Its goal is to produce a fast and flexible imple- 
mentation of Python. 


e The Google-sponsored Unladen Swallow project aims to make standard Python 
faster by a factor of at least 5, and fast enough to replace the C language in many 
contexts. It is an optimization branch of CPython, intended to be fully compatible 
and significantly faster. This project also hopes to remove the Python multithread- 
ing Global Interpreter Lock (GIL), which prevents pure Python threads from truly 
overlapping in time. This is currently an emerging project being developed as open 
source by Google engineers; it is initially targeting Python 2.6, though 3.0 may 
acquire its changes too. Search Google for up-to-date details. 


Although such future implementation schemes may alter the runtime structure of Py- 
thon somewhat, it seems likely that the byte code compiler will still be the standard for 
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some time to come. The portability and runtime flexibility of byte code are important 
features of many Python systems. Moreover, adding type constraint declarations to 
support static compilation would break the flexibility, conciseness, simplicity, and 
overall spirit of Python coding. Due to Python’s highly dynamic nature, any future 
implementation will likely retain many artifacts of the current PVM. 


Chapter Summary 


This chapter introduced the execution model of Python (how Python runs your pro- 
grams) and explored some common variations on that model (just-in-time compilers 
and the like). Although you don’t really need to come to grips with Python internals to 
write Python scripts, a passing acquaintance with this chapter’s topics will help you 
truly understand how your programs run once you start coding them. In the next 
chapter, you'll start actually running some code of your own. First, though, here’s the 
usual chapter quiz. 


Test Your Knowledge: Quiz 


. What is the Python interpreter? 
. What is source code? 

. What is byte code? 

. What is the PVM? 


. Name two variations on Python’s standard execution model. 


Na BR WN 


. How are CPython, Jython, and IronPython different? 


Test Your Knowledge: Answers 


1. The Python interpreter is a program that runs the Python programs you write. 

2. Source code is the statements you write for your program—it consists of text in 
text files that normally end with a .py extension. 

3. Byte code is the lower-level form of your program after Python compiles it. Python 
automatically stores byte code in files with a .pyc extension. 

4. The PVM is the Python Virtual Machine—the runtime engine of Python that in- 
terprets your compiled byte code. 

5. Psyco, Shedskin, and frozen binaries are all variations on the execution model. 

6. CPython is the standard implementation of the language. Jython and IronPython 
implement Python programs for use in Java and .NET environments, respectively; 
they are alternative compilers for Python. 


34 | Chapter 2: How Python Runs Programs 


CHAPTER 3 
How You Run Programs 


OK, it’s time to start running some code. Now that you have a handle on program 
execution, you're finally ready to start some real Python programming. At this point, 
Pll assume that you have Python installed on your computer; if not, see the prior chapter 
and Appendix A for installation and configuration hints. 


There are a variety of ways to tell Python to execute the code you type. This chapter 
discusses all the program launching techniques in common use today. Along the way, 
you'll learn how to type code interactively and how to save it in files to be run with 
system command lines, icon clicks, module imports and reloads, exec calls, menu op- 
tions in GUIs such as IDLE, and more. 


If you just want to find out how to run a Python program quickly, you may be tempted 
to read the parts of this chapter that pertain only to your platform and move on to 
Chapter 4. But don’t skip the material on module imports, as that’s essential to un- 
derstanding Python’s program architecture. I also encourage you to at least skim the 
sections on IDLE and other IDEs, so you’! know what tools are available for when you 
start developing more sophisticated Python programs. 


The Interactive Prompt 


Perhaps the simplest way to run Python programs is to type them at Python’s interactive 
command line, sometimes called the interactive prompt. There are a variety of ways to 
start this command line: in an IDE, from a system console, and so on. Assuming the 
interpreter is installed as an executable program on your system, the most platform- 
neutral way to start an interactive interpreter session is usually just to type python at 
your operating system’s prompt, without any arguments. For example: 
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% python 

Python 3.0.1 (1301:69561, Feb 13 2009, 20:04:18) [MSC v.1500 32 bit (Intel)] ... 
Type "help", "copyright", "credits" or "license" for more information. 

>>> 


Typing the word “python” at your system shell prompt like this begins an interactive 
Python session; the “%” character at the start of this listing stands for a generic system 
promptin this book—it’s not input that you type yourself. The notion of a system shell 
prompt is generic, but exactly how you access it varies by platform: 


* On Windows, you can type python in a DOS console window (a.k.a. the Command 
Prompt, usually found in the Accessories section of the Start>Programs menu) or 
in the Start>Run... dialog box. 


e On Unix, Linux, and Mac OSX, you might type this command ina shell or terminal 
window (e.g., in an xterm or console running a shell such as ksh or csh). 


e Other systems may use similar or platform-specific devices. On handheld devices, 
for example, you generally click the Python iconin the home or application window 
to launch an interactive session. 


If you have not set your shell’s PATH environment variable to include Python’s install 
directory, you may need to replace the word “python” with the full path to the Python 
executable on your machine. On Unix, Linux, and similar, /usr/local/bin/python 
or /usr/bin/python will often suffice. On Windows, try typing C:\Python30\python (for 
version 3.0): 

C:\misc> c:\python30\python 

Python 3.0.1 (1r301:69561, Feb 13 2009, 20:04:18) [MSC v.1500 32 bit (Intel)] ... 


Type "help", "copyright", "credits" or "license" for more information. 
>>> 


Alternatively, you can run a change-directory command to go to Python’s install di- 
rectory before typing “python”—try the cd c:\python30 command on Windows, for 
example: 

C:\misc> cd C:\Python30 

C:\Python30> python 

Python 3.0.1 (r301:69561, Feb 13 2009, 20:04:18) [MSC v.1500 32 bit (Intel)] ... 


Type "help", "copyright", "credits" or "license" for more information. 
>>> 


On Windows, besides typing python in a shell window, you can also begin similar 
interactive sessions by starting IDLE’s main window (discussed later) or by selecting 
the “Python (command line)” menu option from the Start button menu for Python, as 
shown in Figure 2-1 back in Chapter 2. Both spawn a Python interactive prompt with 
equivalent functionality; typing a shell command isn’t necessary. 
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Running Code Interactively 


However it’s started, the Python interactive session begins by printing two lines of 
informational text (which I’ll omit from most of this book’s examples to save space), 
then prompts for input with >>> when it’s waiting for you to type a new Python state- 
ment or expression. When working interactively, the results of your code are displayed 
after the >>> lines after you press the Enter key. 


For instance, here are the results of two Python print statements (print is really a 
function call in Python 3.0, but not in 2.6, so the parentheses here are required in 3.0 
only): 

% python 

>>> print('Hello world!) 

Hello world! 

>>> print(2 ** 8) 

256 
Again, you don’t need to worry about the details of the print statements shown here 
yet; we'll start digging into syntax in the next chapter. In short, they print a Python 
string and an integer, as shown by the output lines that appear after each >>> input line 
(2 ** 8 means 2 raised to the power 8 in Python). 


When coding interactively like this, you can type as many Python commands as you 
like; each is run immediately after it’s entered. Moreover, because the interactive ses- 
sion automatically prints the results of expressions you type, you don’t usually need to 
say “print” explicitly at this prompt: 

>>> lumberjack = 'okay' 

>>> lumberjack 

"okay' 

>>> 2 ** 8 

256 

>>> <== Use Ctrl-D (on Unix) or Ctrl-Z (on Windows) to exit 

% 


Here, the fist line saves a value by assigning it to a variable, and the last two lines typed 
are expressions (lumberjack and 2 ** 8)—their results are displayed automatically. To 
exit an interactive session like this one and return to your system shell prompt, type 
Ctrl-D on Unix-like machines; on MS-DOS and Windows systems, type Ctrl-Z to exit. 
In the IDLE GUI discussed later, either type Ctrl-D or simply close the window. 


Now, we didn’t do much in this session’s code—just typed some Python print and 
assignment statements, along with a few expressions, which we’ll study in detail later. 
The main thing to notice is that the interpreter executes the code entered on each line 
immediately, when the Enter key is pressed. 
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For example, when we typed the first print statement at the >>> prompt, the output (a 
Python string) was echoed back right away. There was no need to create a source-code 
file, and no need to run the code through a compiler and linker first, as you’d normally 
do when using a language such as C or C++. As you'll see in later chapters, you can 
also run multiline statements at the interactive prompt; such a statement runs imme- 
diately after you’ve entered all of its lines and pressed Enter twice to add a blank line. 


Why the Interactive Prompt? 


The interactive prompt runs code and echoes results as you go, but it doesn’t save your 
code in a file. Although this means you won’t do the bulk of your coding in interactive 
sessions, the interactive prompt turns out to be a great place to both experiment with 
the language and test program files on the fly. 


Experimenting 


Because code is executed immediately, the interactive prompt is a perfect place to ex- 
periment with the language and will be used often in this book to demonstrate smaller 
examples. In fact, this is the first rule of thumb to remember: if you’re ever in doubt 
about how a piece of Python code works, fire up the interactive command line and try 
it out to see what happens. 


For instance, suppose you're reading a Python program’s code and you come across 
an expression like 'Spam!' * 8 whose meaning you don’t understand. At this point, 
you can spend 10 minutes wading through manuals and books to try to figure out what 
the code does, or you can simply run it interactively: 

>>> 'Spam!' * 8 <== Learning by trying 

"Spam! Spam! Spam! Spam! Spam! Spam! Spam! Spam! ' 
The immediate feedback you receive at the interactive prompt is often the quickest way 
to deduce what a piece of code does. Here, it’s clear that it does string repetition: in 
Python * means multiply for numbers, but repeat for strings—it’s like concatenating a 
string to itself repeatedly (more on strings in Chapter 4). 


Chances are good that you won’t break anything by experimenting this way—at least, 
not yet. To do real damage, like deleting files and running shell commands, you must 
really try, by importing modules explicitly (you also need to know more about Python’s 
system interfaces in general before you will become that dangerous!). Straight Python 
code is almost always safe to run. 


For instance, watch what happens when you make a mistake at the interactive prompt: 
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>>> X <== Making mistakes 
Traceback (most recent call last): 

File "<stdin>", line 1, in <module> 
NameError: name 'X' is not defined 


In Python, using a variable before it has been assigned a value is always an error (oth- 
erwise, if names were filled in with defaults, some errors might go undetected). We’ll 
learn more about that later; the important point here is that you don’t crash Python or 
your computer when you make a mistake this way. Instead, you get a meaningful error 
message pointing out the mistake and the line of code that made it, and you can con- 
tinue on in your session or script. In fact, once you get comfortable with Python, its 
error messages may often provide as much debugging support as you’ll need (you'll 
read more on debugging in the sidebar “Debugging Python Code” on page 67). 


Testing 


Besides serving as a tool for experimenting while you’re learning the language, the 
interactive interpreter is also an ideal place to test code you’ve written in files. You can 
import your module files interactively and run tests on the tools they define by typing 
calls at the interactive prompt. 


For instance, of the following tests a function in a precoded module that ships with 

Python in its standard library (it prints the name of the directory you’re currently 

working in), but you can do the same once you start writing module files of your own: 
>>> import os 


>>> os.getcwd() <== Testing on the fly 
"c:\\Python30' 


More generally, the interactive prompt is a place to test program components, regard- 
less of their source—you can import and test functions and classes in your Python files, 
type calls to linked-in C functions, exercise Java classes under Jython, and more. Partly 
because of its interactive nature, Python supports an experimental and exploratory 
programming style you’ll find convenient when getting started. 


Using the Interactive Prompt 


Although the interactive prompt is simple to use, there are a few tips that beginners 
should keep in mind. I’m including lists of common mistakes like this in this chapter 
for reference, but they might also spare you from a few headaches if you read them up 
front: 


¢ Type Python commands only. First of all, remember that you can only type Py- 
thon code at the Python prompt, not system commands. There are ways to run 
system commands from within Python code (e.g., with os.system), but they are 
not as direct as simply typing the commands themselves. 
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e print statements are required only in files. Because the interactive interpreter 
automatically prints the results of expressions, you do not need to type complete 
print statements interactively. This is a nice feature, but it tends to confuse users 
when they move on to writing code in files: within a code file, you must use 
print statements to see your output because expression results are not automati- 
cally echoed. Remember, you must say print in files, but not interactively. 


e Don’t indent at the interactive prompt (yet). When typing Python programs, 
either interactively or into a text file, be sure to start all your unnested statements 
in column 1 (that is, all the way to the left). If you don’t, Python may print a 
“SyntaxError” message, because blank space to the left of your code is taken to be 
indentation that groups nested statements. Until Chapter 10, all statements you 
write will be unnested, so this includes everything for now. This seems to be a 
recurring confusion in introductory Python classes. Remember, a leading space 
generates an error message. 


e Watch out for prompt changes for compound statements. We won’t meet 
compound (multiline) statements until Chapter 4, and not in earnest until Chap- 
ter 10, but as a preview, you should know that when typing lines 2 and beyond of 
a compound statement interactively, the prompt may change. In the simple shell 
window interface, the interactive prompt changes to ... instead of >>> for lines 2 
and beyond; in the IDLE interface, lines after the first are automatically indented. 


You'll see why this matters in Chapter 10. For now, if you happen to come across 
a... prompt ora blank line when entering your code, it probably means that you've 
somehow confused interactive Python into thinking you’re typing a multiline 
statement. Try hitting the Enter key or a Ctrl-C combination to get back to the 
main prompt. The >>> and ... prompt strings can also be changed (they are avail- 
able in the built-in module sys), but l'll assume they have not been in the book’s 
example listings. 


e Terminate compound statements at the interactive prompt with a blank 
line. At the interactive prompt, inserting a blank line (by hitting the Enter key at 
the start of a line) is necessary to tell interactive Python that you’re done typing the 
multiline statement. That is, you must press Enter twice to make a compound 
statement run. By contrast, blank lines are not required in files and are simply 
ignored if present. If you don’t press Enter twice at the end of a compound state- 
ment when working interactively, you'll appear to be stuck ina limbo state, because 
the interactive interpreter will do nothing at all—it’s waiting for you to press Enter 
again! 

° The interactive prompt runs one statement at a time. At the interactive prompt, 
you must run one statement to completion before typing another. This is natural 
for simple statements, because pressing the Enter key runs the statement entered. 
For compound statements, though, remember that you must submit a blank line 
to terminate the statement and make it run before you can type the next statement. 
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Entering multiline statements 


At the risk of repeating myself, I received emails from readers who’d gotten burned by 
the last two points as I was updating this chapter, so it probably merits emphasis. Pll 
introduce multiline (a.k.a. compound) statements in the next chapter, and we’ll explore 
their syntax more formally later in this book. Because their behavior differs slightly in 
files and at the interactive prompt, though, two cautions are in order here. 


First, be sure to terminate multiline compound statements like for loops and if tests 
at the interactive prompt with a blank line. You must press the Enter key twice, to ter- 
minate the whole multiline statement and then make it run. For example (pun not 
intended...): 


>>> for x in 'spam': 
print (x) <== Press Enter twice here to make this loop run 


You don’t need the blank line after compound statements in a script file, though; this 
is required only at the interactive prompt. In a file, blank lines are not required and are 
simply ignored when present; at the interactive prompt, they terminate multiline 
statements. 


Also bear in mind that the interactive prompt runs just one statement at a time: you 
must press Enter twice to run a loop or other multiline statement before you can type 
the next statement: 

>>> for x in 'spam': 

F print(x) <== Need to press Enter twice before a new statement 

... print('done') 

File "<stdin>", line 3 
print('done') 
A 
SyntaxError: invalid syntax 


This means you can’t cut and paste multiple lines of code into the interactive prompt, 
unless the code includes blank lines after each compound statement. Such code is better 
run in a file—the next section’s topic. 


System Command Lines and Files 


Although the interactive prompt is great for experimenting and testing, it has one big 
disadvantage: programs you type there go away as soon as the Python interpreter ex- 
ecutes them. Because the code you type interactively is never stored in a file, you can’t 
run it again without retyping it from scratch. Cut-and-paste and command recall can 
help some here, but not much, especially when you start writing larger programs. To 
cut and paste code from an interactive session, you would have to edit out Python 
prompts, program outputs, and so on—not exactly a modern software development 
methodology! 
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To save programs permanently, you need to write your code in files, which are usually 
known as modules. Modules are simply text files containing Python statements. Once 
coded, you can ask the Python interpreter to execute the statements in such a file any 
number of times, and in a variety of ways—by system command lines, by file icon clicks, 
by options in the IDLE user interface, and more. Regardless of how it is run, Python 
executes all the code in a module file from top to bottom each time you run the file. 


Terminology in this domain can vary somewhat. For instance, module files are often 
referred to as programs in Python—that is, a program is considered to be a series of 
precoded statements stored in a file for repeated execution. Module files that are run 
directly are also sometimes called scripts—an informal term usually meaning a top-level 
program file. Some reserve the term “module” for a file imported from another file. 
(More on the meaning of “top-level” and imports in a few moments.) 


Whatever you call them, the next few sections explore ways to run code typed into 
module files. In this section, you’ll learn how to run files in the most basic way: by 
listing their names in a python command line entered at your computer’s system 
prompt. Though it might seem primitive to some, for many programmers a system shell 
command-line window, together with a text editor window, constitutes as much of an 
integrated development environment as they will ever need. 


A First Script 


Let’s get started. Open your favorite text editor (e.g., vi, Notepad, or the IDLE editor), 
and type the following statements into a new text file named script1.py: 


# A first Python script 

import sys # Load a library module 
print(sys.platform) 

print(2 ** 100) # Raise 2 to a power 

x = 'Spam!' 

print(x * 8) # String repetition 


This file is our first official Python script (not counting the two-liner in Chapter 2). You 
shouldn’t worry too much about this file’s code, but as a brief description, this file: 


e Imports a Python module (libraries of additional tools), to fetch the name of the 
platform 
e Runs three print function calls, to display the script’s results 
e Uses a variable named x, created when it’s assigned, to hold onto a string object 
e Applies various object operations that we’ll begin studying in the next chapter 
The sys.platform here is just a string that identifies the kind of computer you’re work- 


ing on; it lives in a standard Python module called sys, which you must import to load 
(again, more on imports later). 
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For color, lve also added some formal Python comments here—the text after the # 
characters. Comments can show up on lines by themselves, or to the right of code on 
a line. The text after a # is simply ignored as a human-readable comment and is not 
considered part of the statement’s syntax. If you’re copying this code, you can ignore 
the comments as well. In this book, we usually use a different formatting style to make 
comments more visually distinctive, but they'll appear as normal text in your code. 


Again, don’t focus on the syntax of the code in this file for now; we’ll learn about all 
of it later. The main point to notice is that you’ve typed this code into a file, rather than 
at the interactive prompt. In the process, you’ve coded a fully functional Python script. 


Notice that the module file is called script1.py. As for all top-level files, it could also be 
called simply script, but files of code you want to import into a client have to end with 
a .py suffix. We'll study imports later in this chapter. Because you may want to import 
them in the future, it’s a good idea to use .py suffixes for most Python files that you 
code. Also, some text editors detect Python files by their .py suffix; if the suffix is not 
present, you may not get features like syntax colorization and automatic indentation. 


Running Files with Command Lines 


Once you’ve saved this text file, you can ask Python to run it by listing its full filename 
as the first argument to a python command, typed at the system shell prompt: 

% python script1.py 

win32 

1267650600228229401496703205376 

Spam! Spam! Spam! Spam! Spam! Spam! Spam! Spam! 
Again, you can type such a system shell command in whatever your system provides 
for command-line entry—a Windows Command Prompt window, an xterm window, 
or similar. Remember to replace “python” with a full directory path, as before, if your 
PATH setting is not configured. 


If all works as planned, this shell command makes Python run the code in this file line 
by line, and you will see the output of the script’s three print statements—the name 
of the underlying platform, 2 raised to the power 100, and the result of the same string 
repetition expression we saw earlier (again, more on the last two of these in Chapter 4). 


If all didn’t work as planned, you’ll get an error message—make sure you’ve entered 
the code in your file exactly as shown, and try again. We’ll talk about debugging options 
in the sidebar “Debugging Python Code” on page 67, but at this point in the book 
your best bet is probably rote imitation. 


Because this scheme uses shell command lines to start Python programs, all the usual 
shell syntax applies. For instance, you can route the output of a Python script to a file 
to save it for later use or inspection by using special shell syntax: 


% python scripti.py > saveit.txt 
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In this case, the three output lines shown in the prior run are stored in the file 
saveit.txt instead of being printed. This is generally known as stream redirection; it 
works for input and output text and is available on Windows and Unix-like systems. 
It also has little to do with Python (Python simply supports it), so we will skip further 
details on shell redirection syntax here. 


Ifyou are working ona Windows platform, this example works the same, but the system 
prompt is normally different: 

C:\Python30> python script1.py 

win32 

1267650600228229401496703205376 

Spam! Spam! Spam! Spam! Spam! Spam! Spam! Spam! 
As usual, be sure to type the full path to Python if you haven’ t set your PATH environment 
variable to include this path or run a change-directory command to go to the path: 

D:\temp> C:\python30\python scripti1.py 

win32 

1267650600228229401496703205376 

Spam! Spam! Spam! Spam! Spam! Spam! Spam! Spam! 
On all recent versions of Windows, you can also type just the name of your script, and 
omit the name of Python itself. Because newer Windows systems use the Windows 
Registry to find a program with which to run a file, you don’t need to name “python” 
on the command line explicitly to runa.py file. The prior command, for example, could 
be simplified to this on most Windows machines: 


D:\temp> script1.py 
Finally, remember to give the full path to your script file if it lives in a different directory 
from the one in which you are working. For example, the following system command 


line, run from D:\other, assumes Python is in your system path but runs a file located 
elsewhere: 


D:\other> python c:\code\otherscript.py 
If your PATH doesn’t include Python’s directory, and neither Python nor your script file 
is in the directory you’re working in, use full paths for both: 

D:\other> C:\Python30\python c:\code\otherscript.py 


Using Command Lines and Files 


Running program files from system command lines is also a fairly straightforward 
launch option, especially if you are familiar with command lines in general from prior 
work. For newcomers, though, here are a few pointers about common beginner traps 
that might help you avoid some frustration: 
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° Beware of automatic extensions on Windows. If you use the Notepad program 
to code program files on Windows, be careful to pick the type All Files when it 
comes time to save your file, and give the file a .py suffix explicitly. Otherwise, 
Notepad will save your file with a .txt extension (e.g., as script1.py.txt), making it 
difficult to run in some launching schemes. 


Worse, Windows hides file extensions by default, so unless you have changed your 
view options you may not even notice that youve coded a text file and not a Python 
file. The file’s icon may give this away—if it doesn’t have a snake on it, you may 
have trouble. Uncolored code in IDLE and files that open to edit instead of run 
when clicked are other symptoms of this problem. 


Microsoft Word similarly adds a .doc extension by default; much worse, it adds 
formatting characters that are not legal Python syntax. As a rule of thumb, always 
pick All Files when saving under Windows, or use a more programmer-friendly 
text editor such as IDLE. IDLE does not even add a .py suffix automatically—a 
feature programmers tend to like, but users do not. 


e Use file extensions and directory paths at system prompts, but not for im- 
ports. Don’t forget to type the full name of your file in system command lines 
that is, use python script1.py rather than python script1. By contrast, Python’s 
import statements, which we’ll meet later in this chapter, omit both the .py file 
suffix and the directory path (e.g., import script1). This may seem trivial, but 
confusing these two is a common mistake. 


At the system prompt, you are in a system shell, not Python, so Python’s module 
file search rules do not apply. Because of that, you must include both the .py ex- 
tension and, if necessary, the full directory path leading to the file you wish to run. 
For instance, to run a file that resides in a different directory from the one in 
which you are working, you would typically list its full path (eg., 
python d:\tests\spam.py). Within Python code, however, you can just say 
import spam and rely on the Python module search path to locate your file, as 
described later. 


e Use print statements in files. Yes, we’ve already been over this, but it is such a 
common mistake that it’s worth repeating at least once here. Unlike in interactive 
coding, you generally must use print statements to see output from program files. 
If you don’t see any output, make sure you’ve said “print” in your file. Again, 
though, print statements are not required in an interactive session, since Python 
automatically echoes expression results; prints don’t hurt here, but are superfluous 
extra typing. 
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Unix Executable Scripts (#!) 


If you are going to use Python on a Unix, Linux, or Unix-like system, you can also turn 
files of Python code into executable programs, much as you would for programs coded 
in a shell language such as csh or ksh. Such files are usually called executable scripts. 
In simple terms, Unix-style executable scripts are just normal text files containing Py- 
thon statements, but with two special properties: 


e Their first line is special. Scripts usually start with a line that begins with the 
characters #! (often called “hash bang”), followed by the path to the Python in- 
terpreter on your machine. 


° They usually have executable privileges. Script files are usually marked as ex- 
ecutable to tell the operating system that they may be run as top-level programs. 
On Unix systems, a command such as chmod +x file.py usually does the trick. 


Let’s look at an example for Unix-like systems. Use your text editor again to create a 
file of Python code called brian: 


#! /usr/local/bin/python 
print('The Bright Side ' + 'of Life...') # + means concatenate for strings 


The special line at the top of the file tells the system where the Python interpreter lives. 
Technically, the first line is a Python comment. As mentioned earlier, all comments in 
Python programs start with a # and span to the end of the line; they are a place to insert 
extra information for human readers of your code. But when a comment such as the 
first line in this file appears, it’s special because the operating system uses it to find an 
interpreter for running the program code in the rest of the file. 


Also, note that this file is called simply brian, without the .py suffix used for the module 
file earlier. Adding a .py to the name wouldn’t hurt (and might help you remember that 
this is a Python program file), but because you don’t plan on letting other modules 
import the code in this file, the name of the file is irrelevant. If you give the file executable 
privileges with a chmod +x brian shell command, you can run it from the operating 
system shell as though it were a binary program: 


% brian 
The Bright Side of Life... 


A note for Windows users: the method described here is a Unix trick, and it may not 
work on your platform. Not to worry; just use the basic command-line technique ex- 
plored earlier. List the file’s name on an explicit python command line:* 


* As we discussed when exploring command lines, modern Windows versions also let you type just the name 
of a .py file at the system command line—they use the Registry to determine that the file should be opened 
with Python (e.g., typing brian.py is equivalent to typing python brian.py). This command-line mode is 
similar in spirit to the Unix #!, though it is system-wide on Windows, not per-file. Note that some 
programs may actually interpret and use a first #! line on Windows much like on Unix, but the DOS system 
shell on Windows simply ignores it. 
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C:\misc> python brian 
The Bright Side of Life... 


In this case, you don’t need the special #! comment at the top (although Python just 
ignores it if it’s present), and the file doesn’t need to be given executable privileges. In 
fact, if you want to run files portably between Unix and Microsoft Windows, your life 
will probably be simpler if you always use the basic command-line approach, not Unix- 
style scripts, to launch programs. 


The Unix env Lookup Trick 


On some Unix systems, you can avoid hardcoding the path to the Python interpreter 
by writing the special first-line comment like this: 


#!/usr/bin/env python 
...script goes here... 


When coded this way, the env program locates the Python interpreter according to your 
system search path settings (i.e., in most Unix shells, by looking in all the directories 
listed in the PATH environment variable). This scheme can be more portable, as you 
don’t need to hardcode a Python install path in the first line of all your scripts. 


Provided you have access to env everywhere, your scripts will run no matter where 
Python lives on your system—you need only change the PATH environment variable 
settings across platforms, not in the first line in all your scripts. Of course, this assumes 
that env lives in the same place everywhere (on some machines, it may be 
in /sbin, /bin, or elsewhere); if not, all portability bets are off! 


Clicking File Icons 


On Windows, the Registry makes opening files with icon clicks easy. Python automat- 
ically registers itself to be the program that opens Python program files when they are 
clicked. Because of that, it is possible to launch the Python programs you write by 
simply clicking (or double-clicking) on their file icons with your mouse cursor. 


On non-Windows systems, you will probably be able to perform a similar trick, but 
the icons, file explorer, navigation schemes, and more may differ slightly. On some 
Unix systems, for instance, you may need to register the .py extension with your file 
explorer GUI, make your script executable using the #! trick discussed in the previous 
section, or associate the file MIME type with an application or command by editing 
files, installing programs, or using other tools. See your file explorer’s documentation 
for more details if clicks do not work correctly right off the bat. 


Clicking Icons on Windows 


To illustrate, let’s keep using the script we wrote earlier, scriptl.py, repeated here to 
minimize page flipping: 
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# A first Python script 


import sys # Load a library module 
print(sys.platform) 

print(2 ** 100) # Raise 2 to a power 

x = 'Spam!' 

print(x * 8) # String repetition 


As we’ve seen, you can always run this file from a system command line: 


C:\misc> c:\python30\python script1.py 
win32 
1267650600228229401496703205376 


However, icon clicks allow you to run the file without any typing at all. If you find this 
file’s icon—for instance, by selecting Computer (or My Computer in XP) in your Start 
menu and working your way down on the C drive on Windows—you will get the file 
explorer picture captured in Figure 3-1 (Windows Vista is being used here). Python 
source files show up with white backgrounds on Windows, and byte code files show 
up with black backgrounds. You will normally want to click (or otherwise run) the 
source code file, in order to pick up your most recent changes. To launch the file here, 
simply click on the icon for script1.py. 
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Figure 3-1. On Windows, Python program files show up as icons in file explorer windows and can 
automatically be run with a double-click of the mouse (though you might not see printed output or 
error messages this way). 
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The input Trick 


Unfortunately, on Windows, the result of clicking on a file icon may not be incredibly 
satisfying. In fact, as it is, this example script generates a perplexing “flash” when 
clicked—not exactly the sort of feedback that budding Python programmers usually 
hope for! This is not a bug, but has to do with the way the Windows version of Python 
handles printed output. 


By default, Python generates a pop-up black DOS console window to serve as a clicked 
file’s input and output. If a script just prints and exits, well, it just prints and exits— 
the console window appears, and text is printed there, but the console window closes 
and disappears on program exit. Unless you are very fast, or your machine is very slow, 
you won’t get to see your output at all. Although this is normal behavior, it’s probably 
not what you had in mind. 


Luckily, it’s easy to work around this. If you need your script’s output to stick around 
when you launch it with an icon click, simply put a call to the built-in input function 
at the very bottom of the script (raw_input in 2.6: see the note ahead). For example: 


# A first Python script 


import sys # Load a library module 
print (sys.platform) 

print(2 ** 100) # Raise 2 to a power 

x = ‘Spam! ' 

print(x * 8) # String repetition 
input () # <== ADDED 


In general, input reads the next line of standard input, waiting if there is none yet 
available. The net effect in this context will be to pause the script, thereby keeping the 
output window shown in Figure 3-2 open until you press the Enter key. 


r 


Eä C:\Python30\python.exe 2/6) 


win32 
1267650600228229401496703205376 
Spam? Spam?Spam?Spam?Spam?Spam?Spam?Spam? 


Figure 3-2. When you click a program’s icon on Windows, you will be able to see its printed output 
if you include an input call at the very end of the script. But you only need to do so in this context! 
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Now that I’ve shown you this trick, keep in mind that it is usually only required for 
Windows, and then only if your script prints text and exits and only if you will launch 
the script by clicking its file icon. You should add this call to the bottom of your top- 
level files if and only if all of these three conditions apply. There is no reason to add 
this call in any other contexts (unless you’re unreasonably fond of pressing your com- 
puter’s Enter key!).t That may sound obvious, but it’s another common mistake in live 
classes. 


Before we move ahead, note that the input call applied here is the input counterpart of 
using the print statement for outputs. It is the simplest way to read user input, and it 
is more general than this example implies. For instance, input: 


e Optionally accepts a string that will be printed as a prompt (e.g., input('Press 
Enter to exit')) 

e Returns to your script a line of text read as a string (e.g., nextinput = input()) 

e Supports input stream redirections at the system shell level (e.g., python spam. py 
< input.txt), just as the print statement does for output 


We'll use input in more advanced ways later in this text; for instance, Chapter 10 will 
apply it in an interactive loop. 


Vv a, 
sS Version skew note: If you are working in Python 2.6 or earlier, use 
aS raw_input() instead of input () in this code. The former was renamed to 
~~ 443s the latter in Python 3.0. Technically, 2.6 has an input too, but it also 


evaluates strings as though they are program code typed into a script, 
and so will not work in this context (an empty string is an error). Python 
3.0’s input (and 2.6’s raw_input) simply returns the entered text as a 
string, unevaluated. To simulate 2.6’s input in 3.0, use eval(input()). 


Other Icon-Click Limitations 


Even with the input trick, clicking file icons is not without its perils. You also may not 
get to see Python error messages. If your script generates an error, the error message 
text is written to the pop-up console window—which then immediately disappears! 
Worse, adding an input call to your file will not help this time because your script will 
likely abort long before it reaches this call. In other words, you won't be able to tell 
what went wrong. 


t It is also possible to completely suppress the pop-up DOS console window for clicked files on Windows. 
Files whose names end in a .pyw extension will display only windows constructed by your script, not the 
default DOS console window. .pyw files are simply .py source files that have this special operational behavior 
on Windows. They are mostly used for Python-coded user interfaces that build windows of their own, often 
in conjunction with various techniques for saving printed output and errors to files. 
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Because of these limitations, it is probably best to view icon clicks as a way to launch 
programs after they have been debugged or have been instrumented to write their out- 
put to a file. Especially when starting out, use other techniques—such as system 
command lines and IDLE (discussed further in the section “The IDLE User Inter- 
face” on page 58)—so that you can see generated error messages and view your 
normal output without resorting to coding tricks. When we discuss exceptions later in 
this book, you'll also learn that it is possible to intercept and recover from errors so 
that they do not terminate your programs. Watch for the discussion of the try statement 
later in this book for an alternative way to keep the console window from closing on 
errors. 


Module Imports and Reloads 


So far, I’ve been talking about “importing modules” without really explaining what this 
term means. We’ll study modules and larger program architecture in depth in Part V, 
but because imports are also a way to launch programs, this section will introduce 
enough module basics to get you started. 


In simple terms, every file of Python source code whose name ends in a .py extension 
isa module. Other files can access the items a module defines by importing that module; 
import operations essentially load another file and grant access to that file’s contents. 
The contents of a module are made available to the outside world through its attributes 
(a term Pl define in the next section). 


This module-based services model turns out to be the core idea behind program ar- 
chitecture in Python. Larger programs usually take the form of multiple module files, 
which import tools from other module files. One of the modules is designated as the 
main or top-level file, and this is the one launched to start the entire program. 


We'll delve into such architectural issues in more detail later in this book. This chapter 
is mostly interested in the fact that import operations run the code in a file that is being 
loaded as a final step. Because of this, importing a file is yet another way to launch it. 


For instance, if you start an interactive session (from a system command line, from the 
Start menu, from IDLE, or otherwise), you can run the script1.py file you created earlier 
with a simple import (be sure to delete the input line you added in the prior section 
first, or you’ll need to press Enter for no reason): 

C:\misc> c:\python30\python 

>>> import script1 

win32 

1267650600228229401496 703205376 

Spam! Spam! Spam! Spam! Spam! Spam! Spam! Spam! 
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This works, but only once per session (really, process) by default. After the first import, 
later imports do nothing, even if you change and save the module’s source file again in 
another window: 


>>> import script1 
>>> import script1 


This is by design; imports are too expensive an operation to repeat more than once per 
file, per program run. As you'll learn in Chapter 21, imports must find files, compile 
them to byte code, and run the code. 


If you really want to force Python to run the file again in the same session without 
stopping and restarting the session, you need to instead call the reload function avail- 
able in the imp standard library module (this function is also a simple built-in in Python 
2.6, but not in 3.0): 

>>> from imp import reload # Must load from module in 3.0 

>>> reload(script1) 

win32 

65536 

Spam! Spam! Spam! Spam! Spam! Spam! Spam! Spam! 

<module 'script1' from 'script1.py'> 

>>> 


The from statement here simply copies a name out of a module (more on this soon). 
The reload function itself loads and runs the current version of your file’s code, picking 
up changes if you’ve changed and saved it in another window. 


This allows you to edit and pick up new code on the fly within the current Python 
interactive session. In this session, for example, the second print statement in 
script1.py was changed in another window to print 2 ** 16 between the time of the 
first import and the reload call. 


The reload function expects the name of an already loaded module object, so you have 
to have successfully imported a module once before you reload it. Notice that reload 
also expects parentheses around the module object name, whereas import does not. 
reload is a function that is called, and import is a statement. 


That’s why you must pass the module name to reload as an argument in parentheses, 
and that’s why you get back an extra output line when reloading. The last output line 
is just the display representation of the reload call’s return value, a Python module 
object. We’ll learn more about using functions in general in Chapter 16. 


52 | Chapter 3: How You Run Programs 


Version skew note: Python 3.0 moved the reload built-in function to the 
imp standard library module. It still reloads files as before, but you must 
import it in order to use it. In 3.0, run an import imp and use 
` imp.reload(M), or run a from imp import reload and use reload(M), as 
shown here. We’ll discuss import and from statements in the next sec- 
tion, and more formally later in this book. 


If you are working in Python 2.6 (or 2.X in general), reload is available 
as a built-in function, so no import is required. In Python 2.6, reload is 
available in both forms—built-in and module function—to aid the tran- 
sition to 3.0. In other words, reloading is still available in 3.0, but an 
extra line of code is required to fetch the reload call. 


The move in 3.0 was likely motivated in part by some well-known issues 
involving reload and from statements that we’ll encounter in the next 
section. In short, names loaded with a from are not directly updated by 
a reload, but names accessed with an import statement are. If your 
names don’t seem to change after a reload, try using import and 
module .attribute name references instead. 


The Grander Module Story: Attributes 


Imports and reloads provide a natural program launch option because import opera- 
tions execute files as a last step. In the broader scheme of things, though, modules serve 
the role of libraries of tools, as you’ll learn in Part V. More generally, a module is mostly 
just a package of variable names, known as a namespace. The names within that package 
are called attributes—an attribute is simply a variable name that is attached to a specific 
object (like a module). 


In typical use, importers gain access to all the names assigned at the top level of a 
module’s file. These names are usually assigned to tools exported by the module— 
functions, classes, variables, and so on—that are intended to be used in other files and 
other programs. Externally, a module file’s names can be fetched with two Python 
statements, import and from, as well as the reload call. 


To illustrate, use a text editor to create a one-line Python module file called myfile.py 
with the following contents: 


title = "The Meaning of Life" 


This may be one of the world’s simplest Python modules (it contains a single assignment 
statement), but it’s enough to illustrate the point. When this file is imported, its code 
is run to generate the module’s attribute. The assignment statement creates a module 
attribute named title. 
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You can access this module’s title attribute in other components in two different ways. 
First, you can load the module as a whole with an import statement, and then qualify 
the module name with the attribute name to fetch it: 


% python # Start Python 
>>> import myfile # Run file; load module as a whole 
>>> print (myfile.title) # Use its attribute names: '.' to qualify 


The Meaning of Life 


In general, the dot expression syntax object.attribute lets you fetch any attribute 
attached to any object, and this is a very common operation in Python code. Here, 
we've used it to access the string variable title inside the module myfile—in other 
words, myfile.title. 


Alternatively, you can fetch (really, copy) names out of a module with from statements: 


% python # Start Python 
>>> from myfile import title # Run file; copy its names 
>>> print(title) # Use name directly: no need to qualify 


The Meaning of Life 


As you'll see in more detail later, from is just like an import, with an extra assignment 
to names in the importing component. Technically, from copies a module’s attributes, 
such that they become simple variables in the recipient—thus, you can simply refer to 
the imported string this time as title (a variable) instead of myfile.title (an attribute 
reference).? 


Whether you use import or from to invoke an import operation, the statements in the 
module file myfile.py are executed, and the importing component (here, the interactive 
prompt) gains access to names assigned at the top level of the file. There’s only one 
such name in this simple example—the variable title, assigned to a string—but the 
concept will be more useful when you start defining objects such as functions and 
classes in your modules: such objects become reusable software components that can 
be accessed by name from one or more client modules. 


In practice, module files usually define more than one name to be used in and outside 
the files. Here’s an example that defines three: 


a = ‘dead' # Define three attributes 
b = ‘parrot’ # Exported to other files 
c = 'sketch' 

print(a, b, c) # Also used in this file 


This file, threenames.py, assigns three variables, and so generates three attributes for 
the outside world. It also uses its own three variables in a print statement, as we see 
when we run this as a top-level file: 


+ Notice that import and from both list the name of the module file as simply myfile without its .py suffix. As 
you'll learn in Part V, when Python looks for the actual file, it knows to include the suffix in its search 
procedure. Again, you must include the . py suffix in system shell command lines, but not in import statements. 
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% python threenames.py 
dead parrot sketch 


All of this file’s code runs as usual the first time it is imported elsewhere (by either an 
import or from). Clients of this file that use import get a module with attributes, while 
clients that use from get copies of the file’s names: 

% python 

>>> import threenames # Grab the whole module 

dead parrot sketch 

>>> 

>>> threenames.b, threenames.c 

(‘parrot', '‘sketch') 

>>> 

>>> from threenames import a, b, c # Copy multiple names 

>>> b, c 

('parrot', 'sketch') 


The results here are printed in parentheses because they are really tuples (a kind of 
object covered in the next part of this book); you can safely ignore them for now. 


Once you start coding modules with multiple names like this, the built-in dir function 
starts to come in handy—you can use it to fetch a list of the names available inside a 
module. The following returns a Python list of strings (we’ll start studying lists in the 
next chapter): 


>>> dir(threenames) 


['__builtins ', '_doc_', ' file _', 


' name_', '_ package _', 'a', 'b', 'c'] 

I ran this on Python 3.0 and 2.6; older Pythons may return fewer names. When the 
dir function is called with the name of an imported module passed in parentheses like 
this, it returns all the attributes inside that module. Some of the names it returns are 
names you get “for free”: names with leading and trailing double underscores are built- 
in names that are always predefined by Python and that have special meaning to the 
interpreter. The variables our code defined by assignment—a, b, and c—show up last 
in the dir result. 


Modules and namespaces 


Module imports are a way to run files of code, but, as we’ll discuss later in the book, 
modules are also the largest program structure in Python programs. 


In general, Python programs are composed of multiple module files, linked together by 
import statements. Each module file is a self-contained package of variables—that is, 
a namespace. One module file cannot see the names defined in another file unless it 
explicitly imports that other file, so modules serve to minimize name collisions in your 
code—because each file is a self-contained namespace, the names in one file cannot 
clash with those in another, even if they are spelled the same way. 
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In fact, as you’ll see, modules are one of a handful of ways that Python goes to great 
lengths to package your variables into compartments to avoid name clashes. We’ll 
discuss modules and other namespace constructs (including classes and function 
scopes) further later in the book. For now, modules will come in handy as a way to run 
your code many times without having to retype it. 


va 

4 
SS import versus from: I should point out that the from statement in a sense 
43 defeats the namespace partitioning purpose of modules—because the 
2 from copies variables from one file to another, it can cause same-named 


variables in the importing file to be overwritten (and won’t warn you if 
it does). This essentially collapses namespaces together, at least in terms 
of the copied variables. 


Because of this, some recommend using import instead of from. I won’t 
go that far, though; not only does from involve less typing, but its pur- 
ported problem is rarely an issue in practice. Besides, this is something 
you control by listing the variables you want in the from; as long as you 
understand that they’ll be assigned values, this is no more dangerous 
than coding assignment statements—another feature you'll probably 
want to use! 


import and reload Usage Notes 


For some reason, once people find out about running files using import and reload, 
many tend to focus on this alone and forget about other launch options that always 
run the current version of the code (e.g., icon clicks, IDLE menu options, and system 
command lines). This approach can quickly lead to confusion, though—you need to 
remember when you ve imported to know if you can reload, you need to remember to 
use parentheses when you call reload (only), and you need to remember to use 
reload in the first place to get the current version of your code to run. Moreover, reloads 
aren’t transitive—reloading a module reloads that module only, not any modules it 
may import—so you sometimes have to reload multiple files. 


Because of these complications (and others we’ll explore later, including the reload/ 
from issue mentioned in a prior note in this chapter), it’s generally a good idea to avoid 
the temptation to launch by imports and reloads for now. The IDLE Run>Run Module 
menu option described in the next section, for example, provides a simpler and less 
error-prone way to run your files, and always runs the current version of your code. 
System shell command lines offer similar benefits. You don’t need to use reload if you 
use these techniques. 


In addition, you may run into trouble if you use modules in unusual ways at this point 
in the book. For instance, if you want to import a module file that is stored in a directory 
other than the one you’re working in, you’ll have to skip ahead to Chapter 21 and learn 
about the module search path. 
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For now, if you must import, try to keep all your files in the directory you are working 
in to avoid complications.§ 


That said, imports and reloads have proven to be a popular testing technique in Python 
classes, and you may prefer using this approach too. As usual, though, if you find 
yourself running into a wall, stop running into a wall! 


Using exec to Run Module Files 


In fact, there are more ways to run code stored in module files than have yet been 
exposed here. For instance, the exec(open('module.py').read()) built-in function call 
is another way to launch files from the interactive prompt without having to import 
and later reload. Each exec runs the current version of the file, without requiring later 
reloads (script1.py is as we left it after a reload in the prior section): 

C:\misc> c:\python30\python 

>>> exec(open('scripti.py').read()) 

win32 


65536 
Spam! Spam! Spam! Spam! Spam! Spam! Spam! Spam! 


...change scripti.py in a text edit window... 


>>> exec(open('scripti.py').read()) 

win32 

4294967296 

Spam! Spam! Spam! Spam! Spam! Spam! Spam! Spam! 
The exec call has an effect similar to an import, but it doesn’t technically import the 
module—by default, each time you call exec this way it runs the file anew, as though 
you had pasted it in at the place where exec is called. Because of that, exec does not 
require module reloads after file changes—it skips the normal module import logic. 


On the downside, because it works as if pasting code into the place where it is called, 
exec, like the from statement mentioned earlier, has the potential to silently overwrite 
variables you may currently be using. For example, our script1.py assigns to a variable 
named x. If that name is also being used in the place where exec is called, the name’s 
value is replaced: 


>>> X = 999 

>>> exec(open('scripti.py').read()) # Code run in this namespace by default 
... same outout... 

>>> X # Its assignments can overwrite names here 
"Spam! ' 


§ If you’re burning with curiosity, the short story is that Python searches for imported modules in every directory 
listed in sys.path—a Python list of directory name strings in the sys module, which is initialized from a 
PYTHONPATH environment variable, plus a set of standard directories. If you want to import from a directory 
other than the one you are working in, that directory must generally be listed in your PYTHONPATH setting. For 
more details, see Chapter 21. 


Using exec to Run Module Files | 57 


By contrast, the basic import statement runs the file only once per process, and it makes 
the file a separate module namespace so that its assignments will not change variables 
in your scope. The price you pay for the namespace partitioning of modules is the need 
to reload after changes. 


Va, 

SS Version skew note: Python 2.6 also includes an execfile('module.py') 
43 built-in function, in addition to allowing the form 
oe exec(open('module.py')), which both automatically read the file’s 


` content. Both of these are equivalent to the 
exec(open('module.py').read()) form, which is more complex but 
runs in both 2.6 and 3.0. 


Unfortunately, neither of these two simpler 2.6 forms is available in 3.0, 
which means you must understand both files and their read methods to 
fully understand this technique today (alas, this seems to be a case of 
aesthetics trouncing practicality in 3.0). In fact, the exec form in 3.0 
involves so much typing that the best advice may simply be not to do 
it—it’s usually best to launch files by typing system shell command lines 
or by using the IDLE menu options described in the next section. For 
more on the 3.0 exec form, see Chapter 9. 


The IDLE User Interface 


So far, we’ve seen how to run Python code with the interactive prompt, system com- 
mand lines, icon clicks, and module imports and exec calls. If you’re looking for some- 
thing a bit more visual, IDLE provides a graphical user interface for doing Python 
development, and it’s a standard and free part of the Python system. It is usually referred 
to as an integrated development environment (IDE), because it binds together various 
development tasks into a single view. l 


In short, IDLE is a GUI that lets you edit, run, browse, and debug Python programs, 
all from a single interface. Moreover, because IDLE is a Python program that uses the 
tkinter GUI toolkit (known as Tkinter in 2.6), it runs portably on most Python plat- 
forms, including Microsoft Windows, X Windows (for Linux, Unix, and Unix-like 
platforms), and the Mac OS (both Classic and OS X). For many, IDLE represents an 
easy-to-use alternative to typing command lines, and a less problem-prone alternative 
to clicking on icons. 


IDLE Basics 


Let’s jump right into an example. IDLE is easy to start under Windows—it has an entry 
in the Start button menu for Python (see Figure 2-1, shown previously), and it can also 
be selected by right-clicking on a Python program icon. On some Unix-like systems, 


|| IDLE is officially a corruption of IDE, but it’s really named in honor of Monty Python member Eric Idle. 
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you may need to launch IDLE’s top-level script from a command line, or by clicking 
on the icon for the idle.pyw or idle.py file located in the idlelib subdirectory of Python’s 
Lib directory. On Windows, IDLE is a Python script that currently lives in C:\Py- 
thon30\Lib\idlelib (or C:Python26\Lib\idlelib in Python 2.6).# 


Figure 3-3 shows the scene after starting IDLE on Windows. The Python shell window 
that opens initially is the main window, which runs an interactive session (notice the 
>>> prompt). This works like all interactive sessions—code you type here is run im- 
mediately after you type it—and serves as a testing tool. 


P 
Tá; Python Shell* i — see Pinan t i Ea elaks) 
File Edit Shell Debug Options Windows Help 
Python 3.1a2 (r31a2:71264M, Apr 5 2009, 22:26:02) [MSC v.1500 32 bit (Intel)] on win32 E| 
Type "copyright", "credits" or "license()" for more information. 

>>> 2 ** 100 

1267650600228229401496703205376 

>>> 'Spam!' * 15 

' Spam! Spam! Spam! Spam! Spam! Spam! Spam! Spam! Spam! Spam! Spam! Spam! Spam! Spam! Spam! ' 
>>> X = 'Spam' 

>>> X + 'NI' 

'SpamNI' 

>>> RESTART 
>>> 

win32 

1267650600228229401496703205376 \ 
Spam! Spam! Spam! Spam! Spam! Spam! Spam! Spam! 
>>> 

>>> import os 

>>> os.getcwd() 

'c:\\misc' 

>>> 

>>> import sys 


>>> sys.platform 


'win32' 

>>> sys.path 

['c:\\misc', 'C:\\Python31\\Lib\\idlelib', 'C:\\Windows\\system32\\python31.zip', 'C:\\ 
Python31\\DLLs', 'C:\\Python31\\lib', 'C:\\Python31\\lib\\plat-win', 'C:\\Python31', 'C 
:\\Python31\\1lib\\site-packages'] 

>>> 


>>> help (bin) 
Help on built-in function bin in module builtins: 


bin(...) 
bin (number) -> string |) 


Return the binary representation of an integer or long integer. 


>>> import thig z 
Ln: 34|Cok 15 


(ET A 


Figure 3-3. The main Python shell window of the IDLE development GUI, shown here running on 
Windows. Use the File menu to begin (New Window) or change (Open...) a source file; use the text 
edit window’s Run menu to run the code in that window (Run Module). 


#IDLE is a Python program that uses the standard library’s tkinter GUI toolkit (a.k.a. Tkinter in Python 2.6) 
to build the IDLE GUL This makes IDLE portable, but it also means that you’ll need to have tkinter support 
in your Python to use IDLE. The Windows version of Python has this by default, but some Linux and Unix 
users may need to install the appropriate tkinter support (a yum tkinter command may suffice on some Linux 
distributions, but see the installation hints in Appendix A for details). Mac OS X may have everything you 
need preinstalled, too; look for an idle command or script on your machine. 
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IDLE uses familiar menus with keyboard shortcuts for most of its operations. To make 
(or edit) a source code file under IDLE, open a text edit window: in the main window, 
select the File pull-down menu, and pick New Window (or Open... to open a text edit 
window displaying an existing file for editing). 


Although it may not show up fully in this book’s graphics, IDLE uses syntax-directed 
colorization for the code typed in both the main window and all text edit windows— 
keywords are one color, literals are another, and so on. This helps give you a better 
picture of the components in your code (and can even help you spot mistakes— 
run-on strings are all one color, for example). 


To run a file of code that you are editing in IDLE, select the file’s text edit window, 
open that window’s Run pull-down menu, and choose the Run Module option listed 
there (or use the equivalent keyboard shortcut, given in the menu). Python will let you 
know that you need to save your file first if you’ve changed it since it was opened or 
last saved and forgot to save your changes—a common mistake when you’re knee deep 
in coding. 


When run this way, the output of your script and any error messages it may generate 
show up back in the main interactive window (the Python shell window). In Fig- 
ure 3-3, for example, the three lines after the “RESTART” line near the middle of the 
window reflect an execution of our script1.py file opened in a separate edit window. 
The “RESTART” message tells us that the user-code process was restarted to run the 
edited script and serves to separate script output (it does not appear if IDLE is started 
without a user-code subprocess—more on this mode in a moment). 


Va, 
as IDLE hint of the day: If you want to repeat prior commands in IDLE’s 
aS main interactive window, you can use the Alt-P key combination to 
Bi scroll backward through the command history, and Alt-N to scroll for- 


` ward (on some Macs, try Ctrl-P and Ctrl-N instead). Your prior com- 
mands will be recalled and displayed, and may be edited and rerun. You 
can also recall commands by positioning the cursor on them, or use 
cut-and-paste operations, but these techniques tend to involve more 
work. Outside IDLE, you may be able to recall commands in an inter- 
active session with the arrow keys on Windows. 


Using IDLE 


IDLE is free, easy to use, portable, and automatically available on most platforms. I 
generally recommend it to Python newcomers because it sugarcoats some of the details 
and does not assume prior experience with system command lines. However, it is 
somewhat limited compared to more advanced commercial IDEs. To help you avoid 
some common pitfalls, here is a list of issues that IDLE beginners should bear in mind: 


e You must add “.py” explicitly when saving your files. I mentioned this when 
talking about files in general, but it’s a common IDLE stumbling block, especially 
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for Windows users. IDLE does not automatically add a .py extension to filenames 
when files are saved. Be careful to type the .py extension yourself when saving a 
file for the first time. If you don’t, while you will be able to run your file from IDLE 
(and system command lines), you will not be able to import it either interactively 
or from other modules. 


Run scripts by selecting Run>Run Module in text edit windows, not by in- 
teractive imports and reloads. Earlier in this chapter, we saw that it’s possible 
to runa file by importing it interactively. However, this scheme can grow complex 
because it requires you to manually reload files after changes. By contrast, using 
the Run>Run Module menu option in IDLE always runs the most current version 
of your file, just like running it using a system shell command line. IDLE also 
prompts you to save your file first, if needed (another common mistake outside 
IDLE). 


You need to reload only modules being tested interactively. Like system shell 
command lines, IDLE’s Run>Run Module menu option always runs the current 
version of both the top-level file and any modules it imports. Because of this, 
Run>Run Module eliminates common confusions surrounding imports. You only 
need to reload modules that you are importing and testing interactively in IDLE. 
If you choose to use the import and reload technique instead of Run>Run Module, 
remember that you can use the Alt-P/Alt-N key combinations to recall prior 
commands. 


You can customize IDLE. To change the text fonts and colors in IDLE, select the 
Configure option in the Options menu of any IDLE window. You can also cus- 
tomize key combination actions, indentation settings, and more; see IDLE’s Help 
pull-down menu for more hints. 


There is currently no clear-screen option in IDLE. This seems to be a frequent 
request (perhaps because it’s an option available in similar IDEs), and it might be 
added eventually. Today, though, there is no way to clear the interactive window’s 
text. If you want the window’s text to go away, you can either press and hold the 
Enter key, or type a Python loop to print a series of blank lines (nobody really uses 
the latter technique, of course, but it sounds more high-tech than pressing the Enter 
key!). 

tkinter GUI and threaded programs may not work well with IDLE. Because 
IDLE is a Python/tkinter program, it can hang if you use it to run certain types of 
advanced Python/tkinter programs. This has become less of an issue in more recent 
versions of IDLE that run user code in one process and the IDLE GUI itself in 
another, but some programs (especially those that use multithreading) might still 
hang the GUI. Your code may not exhibit such problems, but as a rule of thumb, 
it’s always safe to use IDLE to edit GUI programs but launch them using other 
options, such as icon clicks or system command lines. When in doubt, if your code 
fails in IDLE, try it outside the GUI. 
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° If connection errors arise, try starting IDLE in single-process mode. Because 
IDLE requires communication between its separate user and GUI processes, it can 
sometimes have trouble starting up on certain platforms (notably, it fails to start 
occasionally on some Windows machines, due to firewall software that blocks 
connections). If you run into such connection errors, it’s always possible to start 
IDLE with a system command line that forces it to run in single-process mode 
without a user-code subprocess and therefore avoids communication issues: its 
-n command-line flag forces this mode. On Windows, for example, start a Com- 
mand Prompt window and run the system command line idle.py -n from within 
the directory C:\Python30\Lib\idlelib (cd there first if needed). 


¢ Beware of some IDLE usability features. [DLE does much to make life easier 
for beginners, but some of its tricks won’t apply outside the IDLE GUI. For in- 
stance, IDLE runs your scripts in its own interactive namespace, so variables in 
your code show up automatically in the IDLE interactive session—you don’t al- 
ways need to run import commands to access names at the top level of files you’ve 
already run. This can be handy, but it can also be confusing, because outside the 
IDLE environment names must always be imported from files to be used. 


IDLE also automatically changes both to the directory of a file just run and adds 
its directory to the module import search path—a handy feature that allows you 
to import files there without search path settings, but also something that won’t 
work the same when you run files outside IDLE. It’s OK to use such features, but 
don’t forget that they are IDLE behavior, not Python behavior. 


Advanced IDLE Tools 


Besides the basic edit and run functions, IDLE provides more advanced features, in- 
cluding a point-and-click program debugger and an object browser. The IDLE debugger 
is enabled via the Debug menu and the object browser via the File menu. The browser 
allows you to navigate through the module search path to files and objects in files; 
clicking on a file or object opens the corresponding source in a text edit window. 


IDLE debugging is initiated by selecting the Debug Debugger menu option in the main 
window and then starting your script by selecting the Run>Run Module option in the 
text edit window; once the debugger is enabled, you can set breakpoints in your code 
that stop its execution by right-clicking on lines in the text edit windows, show variable 
values, and so on. You can also watch program execution when debugging—the current 
line of code is noted as you step through your code. 


For simpler debugging operations, you can also right-click with your mouse on the text 
of an error message to quickly jump to the line of code where the error occurred—a 
trick that makes it simple and fast to repair and run again. In addition, IDLE’s text 
editor offers a large collection of programmer-friendly tools, including automatic in- 
dentation, advanced text and file search operations, and more. Because IDLE uses 
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intuitive GUI interactions, you should experiment with the system live to get a feel for 
its other tools. 


Other IDEs 


Because IDLE is free, portable, and a standard part of Python, it’s a nice first develop- 
ment tool to become familiar with if you want to use an IDE at all. Again, I recommend 
that you use IDLE for this book’s exercises if you’re just starting out, unless you are 
already familiar with and prefer a command-line-based development mode. There are, 
however, a handful of alternative IDEs for Python developers, some of which are sub- 
stantially more powerful and robust than IDLE. Here are some of the most commonly 
used IDEs: 


Eclipse and PyDev 

Eclipse is an advanced open source IDE GUI. Originally developed as a Java IDE, 
Eclipse also supports Python development when you install the PyDev (ora similar) 
plug-in. Eclipse is a popular and powerful option for Python development, and it 
goes well beyond IDLE’s feature set. It includes support for code completion, syn- 
tax highlighting, syntax analysis, refactoring, debugging, and more. Its downsides 
are that it is a large system to install and may require shareware extensions for some 
features (this may vary over time). Still, when you are ready to graduate from IDLE, 
the Eclipse/PyDev combination is worth your attention. 


Komodo 

A full-featured development environment GUI for Python (and other languages), 
Komodo includes standard syntax-coloring, text-editing, debugging, and other 
features. In addition, Komodo offers many advanced features that IDLE does not, 
including project files, source-control integration, regular-expression debugging, 
and a drag-and-drop GUI builder that generates Python/tkinter code to implement 
the GUIs you design interactively. At this writing, Komodo is not free; it is available 
at http://www.activestate.com. 


NetBeans IDE for Python 

NetBeans is a powerful open-source development environment GUI with support 
for many advanced features for Python developers: code completion, automatic 
indentation and code colorization, editor hints, code folding, refactoring, debug- 
ging, code coverage and testing, projects, and more. It may be used to develop both 
CPython and Jython code. Like Eclipse, NetBeans requires installation steps be- 
yond those of the included IDLE GUI, but it is seen by many as more than worth 
the effort. Search the Web for the latest information and links. 


PythonWin 
PythonWin is a free Windows-only IDE for Python that ships as part of Active- 
State’s ActivePython distribution (and may also be fetched separately from http:// 
www.python.org resources). It is roughly like IDLE, with a handful of useful 
Windows-specific extensions added; for example, PythonWin has support for 
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COM objects. Today, IDLE is probably more advanced than PythonWin (for in- 
stance, IDLE’s dual-process architecture often prevents it from hanging). However, 
PythonWin still offers tools for Windows developers that IDLE does not. See http: 
//www.activestate.com for more information. 


Others 
There are roughly half a dozen other widely used IDEs that I’m aware of (including 
the commercial Wing IDE and PythonCard) but do not have space to do justice to 
here, and more will probably appear over time. In fact, almost every programmer- 
friendly text editor has some sort of support for Python development these days, 
whether it be preinstalled or fetched separately. Emacs and Vim, for instance, have 
substantial Python support. 


I won’t try to document all such options here; for more information, see the re- 
sources available at http://www.python.org or search the Web for “Python IDE.” 
You might also try running a web search for “Python editors”—today, this leads 
you to a wiki page that maintains information about many IDE and text-editor 
options for Python programming. 


Other Launch Options 


At this point, we’ve seen how to run code typed interactively, and how to launch code 
saved in files in a variety of ways—system command lines, imports and execs, GUIs 
like IDLE, and more. That covers most of the cases you’ll see in this book. There are 
additional ways to run Python code, though, most of which have special or narrow 
roles. The next few sections take a quick look at some of these. 


Embedding Calls 


In some specialized domains, Python code may be run automatically by an enclosing 
system. In such cases, we say that the Python programs are embedded in (i.e., run by) 
another program. The Python code itself may be entered into a text file, stored in a 
database, fetched from an HTML page, parsed from an XML document, and so on. 
But from an operational perspective, another system—not you—may tell Python to 
run the code you ’ve created. 


Such an embedded execution mode is commonly used to support end-user customi- 
zation—a game program, for instance, might allow for play modifications by running 
user-accessible embedded Python code at strategic points in time. Users can modify 
this type of system by providing or changing Python code. Because Python code is 
interpreted, there is no need to recompile the entire system to incorporate the change 
(see Chapter 2 for more on how Python code is run). 
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In this mode, the enclosing system that runs your code might be written in C, C++, or 
even Java when the Jython system is used. As an example, it’s possible to create and 
run strings of Python code from a C program by calling functions in the Python runtime 
API (a set of services exported by the libraries created when Python is compiled on your 
machine): 


#include <Python.h> 


Py Initialize(); // This is C, not Python 
PyRun_SimpleString("x = 'brave ' + ‘sir robin'"); // But it runs Python code 


In this C code snippet, a program coded in the C language embeds the Python inter- 
preter by linking in its libraries, and passes it a Python assignment statement string to 
run. C programs may also gain access to Python modules and objects and process or 
execute them using other Python API tools. 


This book isn’t about Python/C integration, but you should be aware that, depending 
on how your organization plans to use Python, you may or may not be the one who 
actually starts the Python programs you create. Regardless, you can usually still use the 
interactive and file-based launching techniques described here to test code in isolation 
from those enclosing systems that may eventually use it. 


Frozen Binary Executables 


Frozen binary executables, described in Chapter 2, are packages that combine your 
program’s byte code and the Python interpreter into a single executable program. This 
approach enables Python programs to be launched in the same ways that you would 
launch any other executable program (icon clicks, command lines, etc.). While this 
option works well for delivery of products, it is not really intended for use during pro- 
gram development; you normally freeze just before shipping (after development is 
finished). See the prior chapter for more on this option. 


Text Editor Launch Options 


As mentioned previously, although they’re not full-blown IDE GUIs, most program- 
mer-friendly text editors have support for editing, and possibly running, Python 
programs. Such support may be built in or fetchable on the Web. For instance, if you 
are familiar with the Emacs text editor, you can do all your Python editing and launch- 
ing from inside that text editor. See the text editor resources page at http://www.python 
.org/editors for more details, or search the Web for the phrase “Python editors.” 


* See Programming Python (O’Reilly) for more details on embedding Python in C/C++. The embedding API 
can call Python functions directly, load modules, and more. Also, note that the Jython system allows Java 
programs to invoke Python code using a Java-based API (a Python interpreter class). 
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Still Other Launch Options 


Depending on your platform, there may be additional ways that you can start Python 
programs. For instance, on some Macintosh systems you may be able to drag Python 
program file icons onto the Python interpreter icon to make them execute, and on 
Windows you can always start Python scripts with the Run... option in the Start menu. 
Additionally, the Python standard library has utilities that allow Python programs to 
be started by other Python programs in separate processes (e.g., oS.popen, os. system), 
and Python scripts might also be spawned in larger contexts like the Web (for instance, 
a web page might invoke a script on a server); however, these are beyond the scope of 
the present chapter. 


Future Possibilities? 


This chapter reflects current practice, but much of the material is both platform- and 
time-specific. Indeed, many of the execution and launch details presented arose during 
the shelf life of this book’s various editions. As with program execution options, it’s 
not impossible that new program launch options may arise over time. 


New operating systems, and new versions of existing systems, may also provide exe- 
cution techniques beyond those outlined here. In general, because Python keeps pace 
with such changes, you should be able to launch Python programs in whatever way 
makes sense for the machines you use, both now and in the future—be that by drawing 
on tablet PCs or PDAs, grabbing icons in a virtual reality, or shouting a script’s name 
over your coworkers’ conversations. 


Implementation changes may also impact launch schemes somewhat (e.g., a full com- 
piler could produce normal executables that are launched much like frozen binaries 
today). If I knew what the future truly held, though, I would probably be talking to a 
stockbroker instead of writing these words! 


Which Option Should | Use? 


With all these options, one question naturally arises: which one is best for me? In 
general, you should give the IDLE interface a try if you are just getting started with 
Python. It provides a user-friendly GUI environment and hides some of the underlying 
configuration details. It also comes with a platform-neutral text editor for coding your 
scripts, and it’s a standard and free part of the Python system. 


If, on the other hand, you are an experienced programmer, you might be more com- 
fortable with simply the text editor of your choice in one window, and another window 
for launching the programs you edit via system command lines and icon clicks (in fact, 
this is how I develop Python programs, but I have a Unix-biased past). Because the 
choice of development environments is very subjective, I can’t offer much more in the 
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way of universal guidelines; in general, whatever environment you like to use will be 
the best for you to use. 


Debugging Python Code 


Naturally, none of my readers or students ever have bugs in their code (insert smiley 
here), but for less fortunate friends of yours who may, here’s a quick look at the strat- 
egies commonly used by real-world Python programmers to debug code: 


* Do nothing. By this, I don’t mean that Python programmers don’t debug their 
code—but when you make a mistake in a Python program, you get a very useful 
and readable error message (you'll get to see some soon, if you haven’t already). 
If you already know Python, and especially for your own code, this is often 
enough—read the error message, and go fix the tagged line and file. For many, this 
is debugging in Python. It may not always be ideal for larger system you didn’t 
write, though. 


¢ Insert print statements. Probably the main way that Python programmers debug 
their code (and the way that I debug Python code) is to insert print statements and 
run again. Because Python runs immediately after changes, this is usually the 
quickest way to get more information than error messages provide. The print 
statements don’t have to be sophisticated—a simple “I am here” or display of 
variable values is usually enough to provide the context you need. Just remember 
to delete or comment out (i.e., add a # before) the debugging prints before you 
ship your code! 


¢ Use IDE GUI debuggers. For larger systems you didn’t write, and for beginners 
who want to trace code in more detail, most Python development GUIs have some 
sort of point-and-click debugging support. IDLE has a debugger too, but it doesn’t 
appear to be used very often in practice—perhaps because it has no command line, 
or perhaps because adding print statements is usually quicker than setting up a 
GUI debugging session. To learn more, see IDLE’s Help, or simply try it on your 
own; its basic interface is described in the section “Advanced IDLE 
Tools” on page 62. Other IDEs, such as Eclipse, NetBeans, Komodo, and Wing 
IDE, offer advanced point-and-click debuggers as well; see their documentation if 
you use them. 


* Use the pdb command-line debugger. For ultimate control, Python comes with 
a source-code debugger named pdb, available as a module in Python’s standard 
library. In pdb, you type commands to step line by line, display variables, set and 
clear breakpoints, continue to a breakpoint or error, and so on. pdb can be 
launched interactively by importing it, or as a top-level script. Either way, because 
you can type commands to control the session, it provides a powerful debugging 
tool. pdb also includes a postmortem function you can run after an exception 
occurs, to get information from the time of the error. See the Python library manual 
and Chapter 35 for more details on pdb. 


e Other options. For more specific debugging requirements, you can find additional 
tools in the open source domain, including support for multithreaded programs, 
embedded code, and process attachment. The Winpdb system, for example, is a 
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standalone debugger with advanced debugging support and cross-platform GUI 
and console interfaces. 


These options will become more important as we start writing larger scripts. Prob- 
ably the best news on the debugging front, though, is that errors are detected and 
reported in Python, rather than passing silently or crashing the system altogether. 
In fact, errors themselves are a well-defined mechanism known as exceptions, 
which you can catch and process (more on exceptions in Part VII). Making mis- 
takes is never fun, of course, but speaking as someone who recalls when debugging 
meant getting out a hex calculator and poring over piles of memory dump print- 
outs, Python’s debugging support makes errors much less painful than they might 
otherwise be. 


Chapter Summary 


In this chapter, we’ve looked at common ways to launch Python programs: by running 
code typed interactively, and by running code stored in files with system command 
lines, file-icon clicks, module imports, exec calls, and IDE GUIs such as IDLE. We’ve 
covered a lot of pragmatic startup territory here. This chapter’s goal was to equip you 
with enough information to enable you to start writing some code, which you'll do in 
the next part of the book. There, we will start exploring the Python language itself, 
beginning with its core data types. 


First, though, take the usual chapter quiz to exercise what you ve learned here. Because 
this is the last chapter in this part of the book, it’s followed with a set of more complete 
exercises that test your mastery of this entire part’s topics. For help with the latter set 
of problems, or just for a refresher, be sure to turn to Appendix B after you’ve given 
the exercises a try. 


Test Your Knowledge: Quiz 


. How can you Start an interactive interpreter session? 

. Where do you type a system command line to launch a script file? 
. Name four or more ways to run the code saved in a script file. 

. Name two pitfalls related to clicking file icons on Windows. 

. Why might you need to reload a module? 

. How do you run a script from within IDLE? 


. Name two pitfalls related to using IDLE. 


ANNO BW NY Fe 


. What is a namespace, and how does it relate to module files? 
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Test Your Knowledge: Answers 


1. You can start an interactive session on Windows by clicking your Start button, 
picking the All Programs option, clicking the Python entry, and selecting the “Py- 
thon (command line)” menu option. You can also achieve the same effect on Win- 
dows and other platforms by typing python as a system command line in your 
system’s console window (a Command Prompt window on Windows). Another 
alternative is to launch IDLE, as its main Python shell window is an interactive 
session. If you have not set your system’s PATH variable to find Python, you may 
need to cd to where Python is installed, or type its full directory path instead of just 
python (e.g., C: \Python30\python on Windows). 


2. You type system command lines in whatever your platform provides as a system 
console: a Command Prompt window on Windows; an xterm or terminal window 
on Unix, Linux, and Mac OS X; and so on. 


3. Code in a script (really, module) file can be run with system command lines, file 
icon clicks, imports and reloads, the exec built-in function, and IDE GUI selections 
such as IDLE’s Run>Run Module menu option. On Unix, they can also be run as 
executables with the #! trick, and some platforms support more specialized launch- 
ing techniques (e.g., drag-and-drop). In addition, some text editors have unique 
ways to run Python code, some Python programs are provided as standalone “fro- 
zen binary” executables, and some systems use Python code in embedded mode, 
where it is run automatically by an enclosing program written in a language like 
C, C++, or Java. The latter technique is usually done to provide a user customi- 
zation layer. 


4. Scripts that print and then exit cause the output file to disappear immediately, 
before you can view the output (which is why the input trick comes in handy); 
error messages generated by your script also appear in an output window that 
closes before you can examine its contents (which is one reason that system com- 
mand lines and IDEs such as IDLE are better for most development). 


5. Python only imports (loads) a module once per process, by default, so if you’ve 
changed its source code and want to run the new version without stopping and 
restarting Python, you'll have to reload it. You must import a module at least once 
before you can reload it. Running files of code from a system shell command line, 
via an icon click, or via an IDE such as IDLE generally makes this a nonissue, as 
those launch schemes usually run the current version of the source code file each 
time. 


6. Within the text edit window of the file you wish to run, select the window’s 
Run>Run Module menu option. This runs the window’s source code as a top-level 
script file and displays its output back in the interactive Python shell window. 

7. IDLE can still be hung by some types of programs—especially GUI programs that 


perform multithreading (an advanced technique beyond this book’s scope). Also, 
IDLE has some usability features that can burn you once you leave the IDLE GUI: 
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a script’s variables are automatically imported to the interactive scope in IDLE, for 
instance, but not by Python in general. 


8. A namespace is just a package of variables (i.e., names). It takes the form of an 
object with attributes in Python. Each module file is automatically a namespace— 
that is, a package of variables reflecting the assignments made at the top level of 
the file. Namespaces help avoid name collisions in Python programs: because each 
module file is a self-contained namespace, files must explicitly import other files 
in order to use their names. 
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It’s time to start doing a little coding on your own. This first exercise session is fairly 
simple, but a few of these questions hint at topics to come in later chapters. Be sure to 
check “Part I, Getting Started” on page 1101 in the solutions appendix (Appendix B) 
for the answers; the exercises and their solutions sometimes contain supplemental in- 
formation not discussed in the main text, so you should take a peek at the solutions 
even if you manage to answer all the questions on your own. 


1. Interaction. Using a system command line, IDLE, or another method, start the 
Python interactive command line (>>> prompt), and type the expression "Hello 
World!" (including the quotes). The string should be echoed back to you. The 
purpose of this exercise is to get your environment configured to run Python. In 
some scenarios, you may need to first run a cd shell command, type the full path 
to the Python executable, or add its path to your PATH environment variable. If 
desired, you can set PATH in your .cshre or .Rshrc file to make Python permanently 
available on Unix systems; on Windows, use a setup.bat, autoexec.bat, or the en- 
vironment variable GUI. See Appendix A for help with environment variable 
settings. 


2. Programs. With the text editor of your choice, write a simple module file containing 
the single statement print('Hello module world!) and store it as modulel.py. 
Now, run this file by using any launch option you like: running it in IDLE, clicking 
on its file icon, passing it to the Python interpreter on the system shell’s command 
line (e.g., python module1.py), built-in exec calls, imports and reloads, and so on. 
In fact, experiment by running your file with as many of the launch techniques 
discussed in this chapter as you can. Which technique seems easiest? (There is no 
right answer to this, of course.) 


3. Modules. Start the Python interactive command line (>>> prompt) and import the 
module you wrote in exercise 2. Try moving the file to a different directory and 
importing it again from its original directory (i.e., run Python in the original di- 
rectory when you import). What happens? (Hint: is there still a module1.pyc byte 
code file in the original directory?) 
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4. Scripts. If your platform supports it, add the #! line to the top of your 
module1.py module file, give the file executable privileges, and run it directly as an 
executable. What does the first line need to contain? #! usually only has meaning 
on Unix, Linux, and Unix-like platforms such as Mac OS X; if you’re working on 
Windows, instead try running your file by listing just its name in a DOS console 
window without the word “python” before it (this works on recent versions of 
Windows), or via the Start>Run... dialog box. 


5. Errors and debugging. Experiment with typing mathematical expressions and as- 
signments at the Python interactive command line. Along the way, type the ex- 
pressions 2 ** 500 and 1 / 0, and reference an undefined variable name as we did 
in this chapter. What happens? 


You may not know it yet, but when you make a mistake, you’re doing exception 
processing (a topic we'll explore in depth in Part VII). As you'll learn there, you 
are technically triggering what’s known as the default exception handler—logic that 
prints a standard error message. If you do not catch an error, the default handler 
does and prints the standard error message in response. 


Exceptions are also bound up with the notion of debugging in Python. When you’re 
first starting out, Python’s default error messages on exceptions will probably pro- 
vide as much error-handling support as you need—they give the cause of the error, 
as well as showing the lines in your code that were active when the error occurred. 
For more about debugging, see the sidebar “Debugging Python Code” 
on page 67. 


6. Breaks and cycles. At the Python command line, type: 


L = [1, 2] # Make a 2-item list 
L.append(L) # Append L as a single item to itself 
L # Print L 


What happens? In all recent versions of Python, you'll see a strange output that 
we'll describe in the solutions appendix, and which will make more sense when 
we study references in the next part of the book. If you’re using a Python version 
older than 1.5.1, a Ctrl-C key combination will probably help on most platforms. 
Why do you think your version of Python responds the way it does for this code? 


If you do have a Python older than Release 1.5.1 (a hopefully rare 
~—tSs scenario today!), make sure your machine can stop a program with 

a Ctrl-C key combination of some sort before running this test, or 
you may be waiting a long time. 


7. Documentation. Spend at least 17 minutes browsing the Python library and lan- 
guage manuals before moving on to get a feel for the available tools in the standard 
library and the structure of the documentation set. It takes at least this long to 
become familiar with the locations of major topics in the manual set; once you’ve 
done this, it’s easy to find what you need. You can find this manual via the Python 
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Start button entry on Windows, in the Python Docs option on the Help pull-down 
menu in IDLE, or online at http:/;www.python.org/doc. Pll also have a few more 
words to say about the manuals and other documentation sources available (in- 
cluding PyDoc and the help function) in Chapter 15. If you still have time, go 
explore the Python website, as well as its PyPy third-party extension repository. 
Especially check out the Python.org documentation and search pages; they can be 
crucial resources. 
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PART Il 


Types and Operations 


CHAPTER 4 
Introducing Python Object Types 


This chapter begins our tour of the Python language. In an informal sense, in Python, 
we do things with stuff. “Things” take the form of operations like addition and con- 
catenation, and “stuff” refers to the objects on which we perform those operations. In 
this part of the book, our focus is on that stuff, and the things our programs can do 
with it. 


Somewhat more formally, in Python, data takes the form of objects—either built-in 
objects that Python provides, or objects we create using Python or external language 
tools such as C extension libraries. Although we'll firm up this definition later, objects 
are essentially just pieces of memory, with values and sets of associated operations. 


Because objects are the most fundamental notion in Python programming, we'll start 
this chapter with a survey of Python’s built-in object types. 


By way of introduction, however, let’s first establish a clear picture of how this chapter 
fits into the overall Python picture. From a more concrete perspective, Python programs 
can be decomposed into modules, statements, expressions, and objects, as follows: 

1. Programs are composed of modules. 

2. Modules contain statements. 

3. Statements contain expressions. 

4. Expressions create and process objects. 
The discussion of modules in Chapter 3 introduced the highest level of this hierarchy. 


This part’s chapters begin at the bottom, exploring both built-in objects and the ex- 
pressions you can code to use them. 
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Why Use Built-in Types? 


If you’ve used lower-level languages such as C or C++, you know that much of your 
work centers on implementing objects—also known as data structures—to represent 
the components in your application’s domain. You need to lay out memory structures, 
manage memory allocation, implement search and access routines, and so on. These 
chores are about as tedious (and error-prone) as they sound, and they usually distract 
from your program’s real goals. 


In typical Python programs, most of this grunt work goes away. Because Python pro- 
vides powerful object types as an intrinsic part of the language, there’s usually no need 
to code object implementations before you start solving problems. In fact, unless you 
have a need for special processing that built-in types don’t provide, you’re almost al- 
ways better off using a built-in object instead of implementing your own. Here are some 
reasons why: 


e Built-in objects make programs easy to write. For simple tasks, built-in types 
are often all you need to represent the structure of problem domains. Because you 
get powerful tools such as collections (lists) and search tables (dictionaries) for free, 
you can use them immediately. You can get a lot of work done with Python’s built- 
in object types alone. 


e Built-in objects are components of extensions. For more complex tasks, you 
may need to provide your own objects using Python classes or C language inter- 
faces. But as you'll see in later parts of this book, objects implemented manually 
are often built on top of built-in types such as lists and dictionaries. For instance, 
a stack data structure may be implemented as a class that manages or customizes 
a built-in list. 

e Built-in objects are often more efficient than custom data structures. Py- 
thon’s built-in types employ already optimized data structure algorithms that are 
implemented in C for speed. Although you can write similar object types on your 
own, you'll usually be hard-pressed to get the level of performance built-in object 
types provide. 


e Built-in objects are a standard part of the language. In some ways, Python 
borrows both from languages that rely on built-in tools (e.g., LISP) and languages 
that rely on the programmer to provide tool implementations or frameworks of 
their own (e.g., C++). Although you can implement unique object types in Python, 
you don’t need to do so just to get started. Moreover, because Python’s built-ins 
are standard, they’re always the same; proprietary frameworks, on the other hand, 
tend to differ from site to site. 


In other words, not only do built-in object types make programming easier, but they’ re 
also more powerful and efficient than most of what can be created from scratch. Re- 
gardless of whether you implement new object types, built-in objects form the core of 
every Python program. 
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Python’s Core Data Types 


Table 4-1 previews Python’s built-in object types and some of the syntax used to code 
their literals—that is, the expressions that generate these objects.’ Some of these types 
will probably seem familiar if you’ve used other languages; for instance, numbers and 
strings represent numeric and textual values, respectively, and files provide an interface 
for processing files stored on your computer. 


Table 4-1. Built-in objects preview 


Object type Example literals/creation 

Numbers 1234, 3.1415, 3+4j, Decimal, Fraction 
Strings ‘spam',"guido's",b'a\x01c' 

Lists [1, [2, 'three'], 4] 

Dictionaries {'food': 'spam', ‘taste’: 'yum'} 
Tuples (1, ‘spam', 4, 'U') 

Files myfile = open('eggs', ‘r') 

Sets set(‘abc'), {'a', 'b', 'c'} 

Other core types Booleans, types, None 

Program unit types Functions, modules, classes (Part IV, Part V, Part VI) 


Implementation-related types Compiled code, stack tracebacks (Part IV, Part VII) 


Table 4-1 isn’t really complete, because everything we process in Python programs is a 
kind of object. For instance, when we perform text pattern matching in Python, we 
create pattern objects, and when we perform network scripting, we use socket objects. 
These other kinds of objects are generally created by importing and using modules and 
have behavior all their own. 


As we'll see in later parts of the book, program units such as functions, modules, and 
classes are objects in Python too—they are created with statements and expressions 
such as def, class, import, and lambda and may be passed around scripts freely, stored 
within other objects, and so on. Python also provides a set of implementation-related 
types such as compiled code objects, which are generally of interest to tool builders 
more than application developers; these are also discussed in later parts of this text. 


We usually call the other object types in Table 4-1 core data types, though, because 
they are effectively built into the Python language—that is, there is specific expression 
syntax for generating most of them. For instance, when you run the following code: 


>>> 'spam' 


* In this book, the term literal simply means an expression whose syntax generates an object—sometimes also 
called a constant. Note that the term “constant” does not imply objects or variables that can never be changed 
(ie., this term is unrelated to C++’s const or Python’s “immutable”—a topic explored in the section 
“Immutability” on page 82). 
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you are, technically speaking, running a literal expression that generates and returns a 
new string object. There is specific Python language syntax to make this object. Simi- 
larly, an expression wrapped in square brackets makes a list, one in curly braces makes 
a dictionary, and so on. Even though, as we’ll see, there are no type declarations in 
Python, the syntax of the expressions you run determines the types of objects you create 
and use. In fact, object-generation expressions like those in Table 4-1 are generally 
where types originate in the Python language. 


Just as importantly, once you create an object, you bind its operation set for all time— 
you can perform only string operations on a string and list operations ona list. As you’ll 
learn, Python is dynamically typed (it keeps track of types for you automatically instead 
of requiring declaration code), but it is also strongly typed (you can perform on an object 
only operations that are valid for its type). 


Functionally, the object types in Table 4-1 are more general and powerful than what 
you may be accustomed to. For instance, you’ll find that lists and dictionaries alone 
are powerful data representation tools that obviate most of the work you do to support 
collections and searching in lower-level languages. In short, lists provide ordered col- 
lections of other objects, while dictionaries store objects by key; both lists and dic- 
tionaries may be nested, can grow and shrink on demand, and may contain objects of 
any type. 

We'll study each of the object types in Table 4-1 in detail in upcoming chapters. Before 
digging into the details, though, let’s begin by taking a quick look at Python’s core 
objects in action. The rest of this chapter provides a preview of the operations we'll 
explore in more depth in the chapters that follow. Don’t expect to find the full story 
here—the goal of this chapter is just to whet your appetite and introduce some key 
ideas. Still, the best way to get started is to get started, so let’s jump right into some 
real code. 


Numbers 


If you’ve done any programming or scripting in the past, some of the object types in 
Table 4-1 will probably seem familiar. Even if you haven’t, numbers are fairly straight- 
forward. Python’s core objects set includes the usual suspects: integers (numbers with- 
out a fractional part), floating-point numbers (roughly, numbers with a decimal point 
in them), and more exotic numeric types (complex numbers with imaginary parts, 
fixed-precision decimals, rational fractions with numerator and denominator, and full- 
featured sets). 


Although it offers some fancier options, Python’s basic number types are, well, basic. 
Numbers in Python support the normal mathematical operations. For instance, the 
plus sign (+) performs addition, a star (*) is used for multiplication, and two stars (**) 
are used for exponentiation: 
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>>> 123 + 222 # Integer addition 


345 

>> 1.5 * 4 # Floating-point multiplication 
6.0 

>>> 2 ** 100 # 2 to the power 100 


1267650600228229401496703205376 


Notice the last result here: Python 3.0’s integer type automatically provides extra pre- 
cision for large numbers like this when needed (in 2.6, a separate long integer type 
handles numbers too large for the normal integer type in similar ways). You can, for 
instance, compute 2 to the power 1,000,000 as an integer in Python, but you probably 
shouldn’t try to print the result—with more than 300,000 digits, you may be waiting 
awhile! 


>>> len(str(2 ** 1000000) ) # How many digits in a really BIG number? 
301030 


Once you start experimenting with floating-point numbers, you’re likely to stumble 
across something that may look a bit odd on first glance: 


>>> 3.1415 * 2 # repr: as code 

6 . 2830000000000004 

>>> print(3.1415 * 2) # str: user-friendly 
6.283 


The first result isn’t a bug; it’s a display issue. It turns out that there are two ways to 
print every object: with full precision (as in the first result shown here), and in a user- 
friendly form (as in the second). Formally, the first form is known as an object’s as- 
code repr, and the second is its user-friendly str. The difference can matter when we 
step up to using classes; for now, if something looks odd, try showing it with a print 
built-in call statement. 


Besides expressions, there are a handful of useful numeric modules that ship with 
Python—modules are just packages of additional tools that we import to use: 

>>> import math 

>>> math.pi 

3. 1415926535897931 

>>> math.sqrt(85) 

9.2195444572928871 


The math module contains more advanced numeric tools as functions, while the 
random module performs random number generation and random selections (here, from 
a Python list, introduced later in this chapter): 

>>> import random 

>>> random. random() 

0.59268735266273953 


>>> random.choice([1, 2, 3, 4]) 
1 


Python also includes more exotic numeric objects—such as complex, fixed-precision, 
and rational numbers, as well as sets and Booleans—and the third-party open source 
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extension domain has even more (e.g., matrixes and vectors). We’ll defer discussion of 
these types until later in the book. 


So far, we’ve been using Python much like a simple calculator; to do better justice to 
its built-in types, let’s move on to explore strings. 


Strings 


Strings are used to record textual information as well as arbitrary collections of bytes. 
They are our first example of what we call a sequence in Python—that is, a positionally 
ordered collection of other objects. Sequences maintain a left-to-right order among the 
items they contain: their items are stored and fetched by their relative position. Strictly 
speaking, strings are sequences of one-character strings; other types of sequences in- 
clude lists and tuples, covered later. 


Sequence Operations 


As sequences, strings support operations that assume a positional ordering among 
items. For example, if we have a four-character string, we can verify its length with the 
built-in len function and fetch its components with indexing expressions: 


>>> S = 'Spam' 


>>> len(S) # Length 

4 

>>> S[o0] # The first item in S, indexing by zero-based position 
1S! 

>>> S[1] # The second item from the left 

'p' 


In Python, indexes are coded as offsets from the front, and so start from 0: the first item 
is at index 0, the second is at index 1, and so on. 


Notice how we assign the string to a variable named S here. We'll go into detail on how 
this works later (especially in Chapter 6), but Python variables never need to be declared 
ahead of time. A variable is created when you assign it a value, may be assigned any 
type of object, and is replaced with its value when it shows up in an expression. It must 
also have been previously assigned by the time you use its value. For the purposes of 
this chapter, it’s enough to know that we need to assign an object to a variable in order 
to save it for later use. 


In Python, we can also index backward, from the end—positive indexes count from 
the left, and negative indexes count back from the right: 


>>> S[-1] # The last item from the end in S 
'm' 
>>> S[-2] # The second to last item from the end 


a 
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Formally, a negative index is simply added to the string’s size, so the following two 
operations are equivalent (though the first is easier to code and less easy to get wrong): 


>>> S[-1] # The last item in S 

'm' 

>>> S[len(S)-1] # Negative indexing, the hard way 
'm' 


Notice that we can use an arbitrary expression in the square brackets, not just a hard- 
coded number literal—anywhere that Python expects a value, we can use a literal, a 
variable, or any expression. Python’s syntax is completely general this way. 


In addition to simple positional indexing, sequences also support a more general form 
of indexing known as slicing, which is a way to extract an entire section (slice) in a single 
step. For example: 


>> S # A 4-character string 

"Spam' 

>>> S[1:3] # Slice of S from offsets 1 through 2 (not 3) 
‘pa’ 


Probably the easiest way to think of slices is that they are a way to extract an entire 
column from a string in a single step. Their general form, X[I:J], means “give me ev- 
erything in X from offset I up to but not including offset J.” The result is returned in a 
new object. The second of the preceding operations, for instance, gives us all the char- 
acters in string S from offsets 1 through 2 (that is, 3 — 1) as a new string. The effect is 
to slice or “parse out” the two characters in the middle. 


Ina slice, the left bound defaults to zero, and the right bound defaults to the length of 
the sequence being sliced. This leads to some common usage variations: 


>>> S[1:] # Everything past the first (1:len(S)) 
‘pam 

>>> S # S itself hasn't changed 

"Spam' 

>>> S[0:3] # Everything but the last 

'Spa' 

>>> S[:3] # Same as S[0:3] 

'Spa' 

>>> S[:-1] # Everything but the last again, but simpler (0:-1) 
'Spa' 

>>> S[:] # All of S as a top-level copy (0:len(S)) 
"Spam' 


Note how negative offsets can be used to give bounds for slices, too, and how the last 
operation effectively copies the entire string. As you’ll learn later, there is no reason to 
copy a string, but this form can be useful for sequences like lists. 


Finally, as sequences, strings also support concatenation with the plus sign (joining two 
strings into a new string) and repetition (making a new string by repeating another): 


>> S 
Spam’ 
>>> S+ 


xyz' # Concatenation 
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' Spamxyz ' 


>>> S # S is unchanged 
'Spam' 
>> S*8 # Repetition 


' SpamSpamSpamSpamSpamSpamSpamSpam' 


Notice that the plus sign (+) means different things for different objects: addition for 
numbers, and concatenation for strings. This is a general property of Python that we’ll 
call polymorphism later in the book—in sum, the meaning of an operation depends on 
the objects being operated on. As you’ll see when we study dynamic typing, this poly- 
morphism property accounts for much of the conciseness and flexibility of Python code. 
Because types aren’t constrained, a Python-coded operation can normally work on 
many different types of objects automatically, as long as they support a compatible 
interface (like the + operation here). This turns out to be a huge idea in Python; you’ll 
learn more about it later on our tour. 


Immutability 


Notice that in the prior examples, we were not changing the original string with any of 
the operations we ran on it. Every string operation is defined to produce a new string 
as its result, because strings are immutable in Python—they cannot be changed in-place 
after they are created. For example, you can’t change a string by assigning to one of its 
positions, but you can always build a new one and assign it to the same name. Because 
Python cleans up old objects as you go (as you'll see later), this isn’t as inefficient as it 
may sound: 

>> S 

'Spam' 

>>> S[o] = 'z' # Immutable objects cannot be changed 


... error text omitted... 
TypeError: 'str' object does not support item assignment 


>>> S= 'z' + S[1:] # But we can run expressions to make new objects 

>> S 

"zpam' 
Every object in Python is classified as either immutable (unchangeable) or not. In terms 
of the core types, numbers, strings, and tuples are immutable; lists and dictionaries are 
not (they can be changed in-place freely). Among other things, immutability can be 
used to guarantee that an object remains constant throughout your program. 


Type-Specific Methods 


Every string operation we’ve studied so far is really a sequence operation—that is, these 
operations will work on other sequences in Python as well, including lists and tuples. 
In addition to generic sequence operations, though, strings also have operations all 
their own, available as methods—functions attached to the object, which are triggered 
with a call expression. 
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For example, the string find method is the basic substring search operation (it returns 
the offset of the passed-in substring, or -1 if it is not present), and the string replace 
method performs global searches and replacements: 

>>> S.find('pa') # Find the offset of a substring 

1 

>> S 

"Spam' 

>>> S.replace('pa', 'XYZ') # Replace occurrences of a substring with another 

"SXYZm' 

>> S 

"Spam' 


Again, despite the names of these string methods, we are not changing the original 
strings here, but creating new strings as the results—because strings are immutable, 
we have to do it this way. String methods are the first line of text-processing tools in 
Python. Other methods split a string into substrings on a delimiter (handy as a simple 
form of parsing), perform case conversions, test the content of the string (digits, letters, 
and so on), and strip whitespace characters off the ends of the string: 

>>> line = 'aaa,bbb,ccccc,dd' 

>>> line.split(',' # Split on a delimiter into a list of substrings 

['aaa', 'bbb', 'ccccc', 'dd'] 

>>> S = 'spam' 


>>> S.upper() # Upper- and lowercase conversions 
"SPAM' 

>>> S.isalpha() # Content tests: isalpha, isdigit, etc. 
True 


>>> line = 'aaa,bbb,ccccc,dd\n' 


>>> line = line.rstrip() # Remove whitespace characters on the right side 
>>> line 
"aaa, bbb, ccccc,dd' 


Strings also support an advanced substitution operation known as formatting, available 
as both an expression (the original) and a string method call (new in 2.6 and 3.0): 


>>> '%s, eggs, and %s' % ('spam', 'SPAM!') # Formatting expression (all) 
"spam, eggs, and SPAM!’ 


>>> '{0}, eggs, and {1}'.format('spam', 'SPAM!') # Formatting method (2.6, 3.0) 

"spam, eggs, and SPAM!’ 
One note here: although sequence operations are generic, methods are not—although 
some types share some method names, string method operations generally work only 
on strings, and nothing else. As a rule of thumb, Python’s toolset is layered: generic 
operations that span multiple types show up as built-in functions or expressions (e.g., 
len(X), X[0]), but type-specific operations are method calls (e.g., aString.upper()). 
Finding the tools you need among all these categories will become more natural as you 
use Python more, but the next section gives a few tips you can use right now. 
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Getting Help 


The methods introduced in the prior section are a representative, but small, sample of 
what is available for string objects. In general, this book is not exhaustive in its look at 
object methods. For more details, you can always call the built-in dir function, which 
returns a list of all the attributes available for a given object. Because methods are 
function attributes, they will show up in this list. Assuming S is still the string, here are 
its attributes on Python 3.0 (Python 2.6 varies slightly): 


>>> dir(S) 
['_add_', '_class_', '_contains_', '_delattr_', '_doc_', '_eq_', 
' format_', '_ge_', '_getattribute_', '_ getitem_', '_ getnewargs ', 


"oft 5) “hash. * init." iter. 
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' mod_', '_mul_', '_ne_', '_new_', '_reduce_', ' reduce ex_', 
"_repr_', '_ymod_', '_yrmul_', '_setattr_', '_sizeof_', '_str_', 
"__subclasshook__', '_formatter_field_name_split', '_formatter_parser', 


‘capitalize’, ‘'center', ‘count', ‘encode’, 'endswith', ‘expandtabs', ‘find’, 
‘format’, 'index', 'isalnum','isalpha', 'isdecimal', ‘isdigit', 'isidentifier', 
"islower', 'isnumeric', ‘isprintable', 'isspace', ‘istitle', ‘isupper', ‘join’, 
"ljust', ‘lower', 'lstrip', 'maketrans', ‘partition’, ‘replace’, 'rfind', 
"rindex', ‘rjust', 'rpartition', ‘rsplit', 'rstrip', 'split', 'splitlines', 
"startswith', 'strip', 'swapcase', 'title', ‘translate’, ‘upper’, 'zfill'] 
You probably won’t care about the names with underscores in this list until later in the 
book, when we study operator overloading in classes—they represent the implemen- 
tation of the string object and are available to support customization. In general, leading 
and trailing double underscores is the naming pattern Python uses for implementation 
details. The names without the underscores in this list are the callable methods on string 


objects. 


The dir function simply gives the methods’ names. To ask what they do, you can pass 
them to the help function: 


>>> help(S.replace) 
Help on built-in function replace: 


replace(...) 
S.replace (old, new[, count]) -> str 


Return a copy of S with all occurrences of substring 
old replaced by new. If the optional argument count is 
given, only the first count occurrences are replaced. 


help is one of a handful of interfaces to a system of code that ships with Python known 
as PyDoc—a tool for extracting documentation from objects. Later in the book, you'll 
see that PyDoc can also render its reports in HTML format. 


You can also ask for help on an entire string (e.g., help(S)), but you may get more help 
than you want to see—i.e., information about every string method. It’s generally better 
to ask about a specific method. 
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For more details, you can also consult Python’s standard library reference manual or 
commercially published reference books, but dir and help are the first line of docu- 
mentation in Python. 


Other Ways to Code Strings 


So far, we’ve looked at the string object’s sequence operations and type-specific meth- 
ods. Python also provides a variety of ways for us to code strings, which we’ll explore 
in greater depth later. For instance, special characters can be represented as backslash 
escape sequences: 


>>> S = 'A\nB\tC' # \n is end-of-line, \t is tab 

>>> len(S) # Each stands for just one character 

5 

>>> ord('\n') # \n is a byte with the binary value 10 in ASCII 
10 

>>> S = 'A\oB\oC' # \0, a binary zero byte, does not terminate string 
>>> len(S) 

5 


Python allows strings to be enclosed in single or double quote characters (they mean 
the same thing). It also allows multiline string literals enclosed in triple quotes (single 
or double) —when this form is used, all the lines are concatenated together, and end- 
of-line characters are added where line breaks appear. This is a minor syntactic con- 
venience, but it’s useful for embedding things like HTML and XML code in a Python 
script: 

>>> msg = """ aaaaaaaaaaaaa 

bbb' ' 'bbbbbbbbbb""bbbbbbb ' bbbb 

ccccccececcccc""" 

>>> msg 

'\naaaaaaaaaaaaa\nbbb\'\'\'bbbbbbbbbb""bbbbbbb\ 'bbbb\ncccccccccccccc' 


Python also supports a raw string literal that turns off the backslash escape mechanism 
(such string literals start with the letter r), as well as Unicode string support that sup- 
ports internationalization. In 3.0, the basic str string type handles Unicode too (which 
makes sense, given that ASCII text is a simple kind of Unicode), and a bytes type 
represents raw byte strings; in 2.6, Unicode is a separate type, and str handles both 8- 
bit strings and binary data. Files are also changed in 3.0 to return and accept str for 
text and bytes for binary data. We’ll meet all these special string forms in later chapters. 


Pattern Matching 


One point worth noting before we move on is that none of the string object’s methods 
support pattern-based text processing. Text pattern matching is an advanced tool out- 
side this book’s scope, but readers with backgrounds in other scripting languages may 
be interested to know that to do pattern matching in Python, we import a module called 
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re. This module has analogous calls for searching, splitting, and replacement, but be- 
cause we can use patterns to specify substrings, we can be much more general: 

>>> import re 

>>> match = re.match('Hello[ \t]*(.*)world', ‘Hello Python world’) 

>>> match. group(1) 

"Python ' 
This example searches for a substring that begins with the word “Hello,” followed by 
zero or more tabs or spaces, followed by arbitrary characters to be saved as a matched 
group, terminated by the word “world.” If such a substring is found, portions of the 
substring matched by parts of the pattern enclosed in parentheses are available as 
groups. The following pattern, for example, picks out three groups separated by 
slashes: 

>>> match = re.match('/(.*)/(.*)/(.*)', '/usr/home/lumberjack' ) 

>>> match. groups() 

(‘usr', 'home', ‘lumberjack') 
Pattern matching is a fairly advanced text-processing tool by itself, but there is also 
support in Python for even more advanced language processing, including natural lan- 
guage processing. I’ve already said enough about strings for this tutorial, though, so 
let’s move on to the next type. 


Lists 


The Python list object is the most general sequence provided by the language. Lists are 
positionally ordered collections of arbitrarily typed objects, and they have no fixed size. 
They are also mutable—unlike strings, lists can be modified in-place by assignment to 
offsets as well as a variety of list method calls. 


Sequence Operations 


Because they are sequences, lists support all the sequence operations we discussed for 
strings; the only difference is that the results are usually lists instead of strings. For 
instance, given a three-item list: 


>>> L = [123, ‘spam’, 1.23] # A list of three different-type objects 
>>> len(L) # Number of items in the list 
3 


we can index, slice, and so on, just as for strings: 


>>> L[o] # Indexing by position 
123 
>>> L[:-1] # Slicing a list returns a new list 


[123, 'spam'] 


>>> L + [4, 5, 6] # Concatenation makes a new list too 
[123, ‘spam’, 1.23, 4, 5, 6] 
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>> L # We're not changing the original list 
[123, 'spam', 1.23] 


Type-Specific Operations 


Python’s lists are related to arrays in other languages, but they tend to be more powerful. 
For one thing, they have no fixed type constraint—the list we just looked at, for ex- 
ample, contains three objects of completely different types (an integer, a string, and a 
floating-point number). Further, lists have no fixed size. That is, they can grow and 
shrink on demand, in response to list-specific operations: 


>>> L.append('NI') # Growing: add object at end of list 
>> L 


[123, 'spam', 1.23, 'NI'] 


>>> L.pop(2) # Shrinking: delete an item in the middle 
1.23 
>> L # "del L[2]" deletes from a list too 


[123, 'spam', 'NI'] 


Here, the list append method expands the list’s size and inserts an item at the end; the 
pop method (or an equivalent del statement) then removes an item at a given offset, 
causing the list to shrink. Other list methods insert an item at an arbitrary position 
(insert), remove a given item by value (remove), and so on. Because lists are mutable, 
most list methods also change the list object in-place, instead of creating a new one: 

>>> M = ['bb', ‘aa’, 'cc'] 

>>> M.sort() 

>>> M 

['aa', 'bb', 'cc'] 

>>> M.reverse() 

>>> M 

['cc', 'bb', 'aa'] 
The list sort method here, for example, orders the list in ascending fashion by default, 
and reverse reverses it—in both cases, the methods modify the list directly. 


Bounds Checking 


Although lists have no fixed size, Python still doesn’t allow us to reference items that 
are not present. Indexing off the end of a list is always a mistake, but so is assigning off 
the end: 


>>> L 
[123, 'spam', 'NI'] 


>>> L[99] 
... error text omitted... 
IndexError: list index out of range 
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>>> L[99] = 1 
...error text omitted... 
IndexError: list assignment index out of range 


This is intentional, as it’s usually an error to try to assign off the end of a list (and a 
particularly nasty one in the C language, which doesn’t do as much error checking as 
Python). Rather than silently growing the list in response, Python reports an error. To 
grow a list, we call list methods such as append instead. 


Nesting 


One nice feature of Python’s core data types is that they support arbitrary nesting—we 
can nest them in any combination, and as deeply as we like (for example, we can have 
alist that contains a dictionary, which contains another list, and so on). One immediate 
application of this feature is to represent matrixes, or “multidimensional arrays” in 
Python. A list with nested lists will do the job for basic applications: 


>>> M = [[1, 2, 3], # A 3 x 3 matrix, as nested lists 
[4, 5, 6], # Code can span lines if bracketed 
[7, 8, 9]] 

>>> M 


[[1, 2, 3], [4, 5, 6], [7, 8, 9]] 


Here, we’ve coded a list that contains three other lists. The effect is to represent a 
3 x 3 matrix of numbers. Such a structure can be accessed in a variety of ways: 


>>> M[1] # Get row 2 

[4, 5, 6] 

>>> M[1][2] # Get row 2, then get item 3 within the row 
6 


The first operation here fetches the entire second row, and the second grabs the third 
item within that row. Stringing together index operations takes us deeper and deeper 
into our nested-object structure.t 


Comprehensions 


In addition to sequence operations and list methods, Python includes a more advanced 
operation known as a list comprehension expression, which turns out to be a powerful 
way to process structures like our matrix. Suppose, for instance, that we need to extract 
the second column of our sample matrix. It’s easy to grab rows by simple indexing 


t This matrix structure works for small-scale tasks, but for more serious number crunching you will probably 
want to use one of the numeric extensions to Python, such as the open source NumPy system. Such tools can 
store and process large matrixes much more efficiently than our nested list structure. NumPy has been said 
to turn Python into the equivalent of a free and more powerful version of the Matlab system, and organizations 
such as NASA, Los Alamos, and JPMorgan Chase use this tool for scientific and financial tasks. Search the 
Web for more details. 
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because the matrix is stored by rows, but it’s almost as easy to get a column with a list 
comprehension: 


>>> col2 = [row[1] for row in M] # Collect the items in column 2 
>>> col2 

[2, 5, 8] 

>>> M # The matrix is unchanged 


[[1, 2, 3], [4, 5, 6], [7, 8, 9]] 


List comprehensions derive from set notation; they are a way to build a new list by 
running an expression on each item in a sequence, one at a time, from left to right. List 
comprehensions are coded in square brackets (to tip you off to the fact that they make 
a list) and are composed of an expression and a looping construct that share a variable 
name (row, here). The preceding list comprehension means basically what it says: “Give 
me row[1] for each row in matrix M, in a new list.” The result is a new list containing 
column 2 of the matrix. 


List comprehensions can be more complex in practice: 


>>> [row[1] + 1 for row in M] # Add 1 to each item in column 2 
[3, 6, 9] 


>>> [row[1] for row in M if row[1] % 2 == 0] # Filter out odd items 

[2, 8] 
The first operation here, for instance, adds 1 to each item as it is collected, and the 
second uses an if clause to filter odd numbers out of the result using the % modulus 
expression (remainder of division). List comprehensions make new lists of results, but 
they can be used to iterate over any iterable object. Here, for instance, we use list com- 
prehensions to step over a hardcoded list of coordinates and a string: 


>>> diag = [M[i][i] for i in [0, 1, 2]] # Collect a diagonal from matrix 
>>> diag 

[1, 5, 9] 

>>> doubles = [c * 2 for c in 'spam'] # Repeat characters in a string 


>>> doubles 


['ss', ‘pp’, 'aa', 'mm'] 

List comprehensions, and relatives like the map and filter built-in functions, are a bit 
too involved for me to say more about them here. The main point of this brief intro- 
duction is to illustrate that Python includes both simple and advanced tools in its ar- 
senal. List comprehensions are an optional feature, but they tend to be handy in practice 
and often provide a substantial processing speed advantage. They also work on any 
type that is a sequence in Python, as well as some types that are not. You’ll hear much 
more about them later in this book. 


As a preview, though, you’ll find that in recent Pythons, comprehension syntax in 
parentheses can also be used to create generators that produce results on demand (the 
sum built-in, for instance, sums items in a sequence): 
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>>> G = (sum(row) for row in M) # Create a generator of row sums 
>>> next(G) 

6 

>>> next(G) # Run the iteration protocol 

15 


The map built-in can do similar work, by generating the results of running items through 
a function. Wrapping it in list forces it to return all its values in Python 3.0: 


>>> list(map(sum, M)) # Map sum over items in M 
[6, 15, 24] 


In Python 3.0, comprehension syntax can also be used to create sets and dictionaries: 


>>> {sum(row) for row in M} # Create a set of row sums 
{24, 6, 15} 
>>> {i : sum(M[i]) for i in range(3)} # Creates key/value table of row sums 


{0: 6, 1: 15, 2: 24} 


In fact, lists, sets, and dictionaries can all be built with comprehensions in 3.0: 


>>> [ord(x) for x in 'spaam'] # List of character ordinals 
[115, 112, 97, 97, 109] 

>>> {ord(x) for x in 'spaam'} # Sets remove duplicates 
{112, 97, 115, 109} 

>>> {x: ord(x) for x in 'spaam'} # Dictionary keys are unique 


{'a': 97, 'p': 112, 's': 115, 'm': 109} 


To understand objects like generators, sets, and dictionaries, though, we must move 
ahead. 


Dictionaries 


Python dictionaries are something completely different (Monty Python reference 
intended)—they are not sequences at all, but are instead known as mappings. Mappings 
are also collections of other objects, but they store objects by key instead of by relative 
position. In fact, mappings don’t maintain any reliable left-to-right order; they simply 
map keys to associated values. Dictionaries, the only mapping type in Python’s core 
objects set, are also mutable: they may be changed in-place and can grow and shrink 
on demand, like lists. 


Mapping Operations 


When written as literals, dictionaries are coded in curly braces and consist of a series 
of “key: value” pairs. Dictionaries are useful anytime we need to associate a set of values 
with keys—to describe the properties of something, for instance. As an example, con- 
sider the following three-item dictionary (with keys “food,” “quantity,” and “color”): 


>>> D = {'food': 'Spam', ‘quantity’: 4, 'color': 'pink'} 
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We can index this dictionary by key to fetch and change the keys’ associated values. 
The dictionary index operation uses the same syntax as that used for sequences, but 
the item in the square brackets is a key, not a relative position: 


>>> D['food" ] # Fetch value of key 'food' 
"Spam' 


>>> D['quantity'] += 1 # Add 1 to ‘quantity’ value 

>> D 

{'food': 'Spam', 'color': 'pink', 'quantity': 5} 
Although the curly-braces literal form does see use, it is perhaps more common to see 
dictionaries built up in different ways. The following code, for example, starts with an 
empty dictionary and fills it out one key at a time. Unlike out-of-bounds assignments 
in lists, which are forbidden, assignments to new dictionary keys create those keys: 


>>> D= {} 

>>> D['name'] = 'Bob' # Create keys by assignment 
>>> D['job'] = 'dev' 

>>> D['age'] = 40 

>> D 


{'age': 40, 'job': 'dev', 'name': 'Bob'} 


>>> print(D[ ‘name’ ]) 

Bob 
Here, we’re effectively using dictionary keys as field names in a record that describes 
someone. In other applications, dictionaries can also be used to replace searching 


operations—indexing a dictionary by key is often the fastest way to code a search in 
Python. 


Nesting Revisited 


In the prior example, we used a dictionary to describe a hypothetical person, with three 
keys. Suppose, though, that the information is more complex. Perhaps we need to 
record a first name and a last name, along with multiple job titles. This leads to another 
application of Python’s object nesting in action. The following dictionary, coded all at 
once as a literal, captures more structured information: 
>>> rec = {'name': {'first': 'Bob', 'last': 'Smith'}, 

"job': ['dev', 'mgr'], 

'age': 40.5} 
Here, we again have a three-key dictionary at the top (keys “name,” “job,” and “age”), 
but the values have become more complex: a nested dictionary for the name to support 
multiple parts, and a nested list for the job to support multiple roles and future expan- 
sion. We can access the components of this structure much as we did for our matrix 
earlier, but this time some of our indexes are dictionary keys, not list offsets: 
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>>> rec['name'] # ‘name’ is a nested dictionary 
{'last': 'Smith', 'first': 'Bob'} 


>>> rec['name']['last'] # Index the nested dictionary 

"Smith' 

>>> rec['job'] # ‘job’ isa nested list 

['dev', 'mgr'] 

>>> rec['job'][-1] # Index the nested list 

'mgr' 

>>> rec['job'].append('janitor') # Expand Bob's job description in-place 
>>> rec 


{'age': 40.5, 'job': ['dev', 'mgr', 'janitor'], 'name': {'last': 'Smith', 

'first': 'Bob'}} 
Notice how the last operation here expands the nested job list—because the job list is 
a separate piece of memory from the dictionary that contains it, it can grow and shrink 
freely (object memory layout will be discussed further later in this book). 


The real reason for showing you this example is to demonstrate the flexibility of Py- 
thon’s core data types. As you can see, nesting allows us to build up complex infor- 
mation structures directly and easily. Building a similar structure in a low-level language 
like C would be tedious and require much more code: we would have to lay out and 
declare structures and arrays, fill out values, link everything together, and so on. In 
Python, this is all automatic—running the expression creates the entire nested object 
structure for us. In fact, this is one of the main benefits of scripting languages like 
Python. 


Just as importantly, in a lower-level language we would have to be careful to clean up 
all of the object’s space when we no longer need it. In Python, when we lose the last 
reference to the object—by assigning its variable to something else, for example—all 
of the memory space occupied by that object’s structure is automatically cleaned up 
for us: 


>>> rec = 0 # Now the object's space is reclaimed 


Technically speaking, Python has a feature known as garbage collection that cleans up 
unused memory as your program runs and frees you from having to manage such details 
in your code. In Python, the space is reclaimed immediately, as soon as the last reference 
to an object is removed. We’ll study how this works later in this book; for now, it’s 
enough to know that you can use objects freely, without worrying about creating their 
space or cleaning up as you go.t 


+ Keep in mind that the rec record we just created really could be a database record, when we employ Python’s 
object persistence system—an easy way to store native Python objects in files or access-by-key databases. We 
won’t go into details here, but watch for discussion of Python’s pickle and shelve modules later in this book. 
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Sorting Keys: for Loops 


As mappings, as we’ve already seen, dictionaries only support accessing items by key. 
However, they also support type-specific operations with method calls that are useful 
in a variety of common use cases. 


As mentioned earlier, because dictionaries are not sequences, they don’t maintain any 
dependable left-to-right order. This means that if we make a dictionary and print it 
back, its keys may come back in a different order than that in which we typed them: 


>>> D = {'a': 1, 'b': 2, 'c': 3} 
>>> D 
{'a': 1, 'c': 3, ‘b': 2} 


What do we do, though, if we do need to impose an ordering on a dictionary’s items? 
One common solution is to grab a list of keys with the dictionary keys method, sort 
that with the list sort method, and then step through the result with a Python for loop 
(be sure to press the Enter key twice after coding the for loop below—as explained in 
Chapter 3, an empty line means “go” at the interactive prompt, and the prompt changes 


to “...” on some interfaces): 
>>> Ks = list(D.keys()) # Unordered keys list 
>>> Ks # A list in 2.6, "view" in 3.0: use list() 


Pale bt 


>>> Ks.sort() # Sorted keys list 
>>> Ks 
['a', 'b', 'c'] 


>>> for key in Ks: # Iterate though sorted keys 
print(key, '=>', D[key]) # <== press Enter twice here 

a => 1 

b => 2 

c => 3 


This is a three-step process, although, as we’ll see in later chapters, in recent versions 
of Python it can be done in one step with the newer sorted built-in function. The 
sorted call returns the result and sorts a variety of object types, in this case sorting 
dictionary keys automatically: 


>> D 
f 


{'a': 1, 'c': 3, "b": 2} 


>>> for key in sorted(D): 


print(key, '=>', D[key]) 


a=>1 

b => 2 

c => 3 
Besides showcasing dictionaries, this use case serves to introduce the Python for loop. 
The for loop is a simple and efficient way to step through all the items in a sequence 
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and run a block of code for each item in turn. A user-defined loop variable (key, here) 
is used to reference the current item each time through. The net effect in our example 
is to print the unordered dictionary’s keys and values, in sorted-key order. 


The for loop, and its more general cousin the while loop, are the main ways we code 
repetitive tasks as statements in our scripts. Really, though, the for loop (like its relative 
the list comprehension, which we met earlier) is a sequence operation. It works on any 
object that is a sequence and, like the list comprehension, even on some things that are 
not. Here, for example, it is stepping across the characters in a string, printing the 
uppercase version of each as it goes: 
>>> for c in ‘spam’: 
print(c.upper()) 


=zrunyn 


Python’s while loop is a more general sort of looping tool, not limited to stepping across 
sequences: 
>>> x= 4 
>>> while x > 0: 
print('spam!' * x) 
x =.= 


spam! spam! spam! spam! 
spam! spam! spam! 
spam! spam! 

spam! 


We'll discuss looping statements, syntax, and tools in depth later in the book. 


Iteration and Optimization 


If the last section’s for loop looks like the list comprehension expression introduced 
earlier, it should: both are really general iteration tools. In fact, both will work on any 
object that follows the iteration protocol—a pervasive idea in Python that essentially 
means a physically stored sequence in memory, or an object that generates one item at 
a time in the context of an iteration operation. An object falls into the latter category 
if it responds to the iter built-in with an object that advances in response to next. The 
generator comprehension expression we saw earlier is such an object. 


Pll have more to say about the iteration protocol later in this book. For now, keep in 
mind that every Python tool that scans an object from left to right uses the iteration 
protocol. This is why the sorted call used in the prior section works on the dictionary 
directly—we don’t have to call the keys method to get a sequence because dictionaries 
are iterable objects, with a next that returns successive keys. 
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This also means that any list comprehension expression, such as this one, which com- 
putes the squares of a list of numbers: 
>>> squares = [x ** 2 for x in [1, 2, 3, 4, 5]] 


>>> squares 
[1, 4, 9, 16, 25] 


can always be coded as an equivalent for loop that builds the result list manually by 
appending as it goes: 
>>> squares = [] 


>>> for x in [1, 2, 3, 4, 5]: # This is what a list comprehension does 
squares.append(x ** 2) # Both run the iteration protocol internally 


>>> squares 
[1, 4, 9, 16, 25] 


The list comprehension, though, and related functional programming tools like map 
and filter, will generally run faster than a for loop today (perhaps even twice as fast) — 
a property that could matter in your programs for large data sets. Having said that, 
though, I should point out that performance measures are tricky business in Python 
because it optimizes so much, and performance can vary from release to release. 


A major rule of thumb in Python is to code for simplicity and readability first and worry 
about performance later, after your program is working, and after you’ve proved that 
there is a genuine performance concern. More often than not, your code will be quick 
enough as it is. If you do need to tweak code for performance, though, Python includes 
tools to help you out, including the time and timeit modules and the profile module. 
You'll find more on these later in this book, and in the Python manuals. 


Missing Keys: if Tests 


One other note about dictionaries before we move on. Although we can assign to a new 
key to expand a dictionary, fetching a nonexistent key is still a mistake: 


>> D 

Farr pere 35° “bo 

>>> D['e'] = 99 # Assigning new keys grows dictionaries 
>> D 

{'a': 1, 'c': 3, 'b': 2, 'e': 99} 

>>> D['f'] # Referencing a nonexistent key is an error 
.. error text omitted... 

KeyError: 'f' 


This is what we want—it’s usually a programming error to fetch something that isn’t 
really there. But in some generic programs, we can’t always know what keys will be 
present when we write our code. How do we handle such cases and avoid errors? One 
trick is to test ahead of time. The dictionary in membership expression allows us to 
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query the existence of a key and branch on the result with a Python if statement (as 
with the for, be sure to press Enter twice to run the if interactively here): 

>>> 'f' in D 

False 


>>> if not 'f' in D: 
print('missing' ) 


missing 


Pll have much more to say about the if statement and statement syntax in general later 
in this book, but the form we’re using here is straightforward: it consists of the word 
if, followed by an expression that is interpreted as a true or false result, followed by a 
block of code to run if the test is true. In its full form, the if statement can also have 
an else clause for a default case, and one or more elif (else if) clauses for other tests. 
It’s the main selection tool in Python, and it’s the way we code logic in our scripts. 


Still, there are other ways to create dictionaries and avoid accessing nonexistent keys: 
the get method (a conditional index with a default); the Python 2.X has_key method 
(which is no longer available in 3.0); the try statement (a tool we'll first meet in Chap- 
ter 10 that catches and recovers from exceptions altogether); and the if/else expression 
(essentially, an if statement squeezed onto a single line). Here are a few examples: 

>>> value = D.get('x', 0) # Index but with a default 

>>> value 

0 

>>> value = D['x'] if 'x' in D else 0 # iflelse expression form 


>>> value 
0 


We'll save the details on such alternatives until a later chapter. For now, let’s move on 
to tuples. 


Tuples 


The tuple object (pronounced “toople” or “tuhple,” depending on who you ask) is 
roughly like a list that cannot be changed—tuples are sequences, like lists, but they are 
immutable, like strings. Syntactically, they are coded in parentheses instead of square 
brackets, and they support arbitrary types, arbitrary nesting, and the usual sequence 
operations: 


>>> T= (1, 2, 3, 4) # A 4-item tuple 
>>> len(T) # Length 

4 

>> T + (5, 6) # Concatenation 


(1, 2, 3, 4, 5, 6) 


>>> T[o] # Indexing, slicing, and more 
1 
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Tuples also have two type-specific callable methods in Python 3.0, but not nearly as 
many as lists: 


>>> T.index(4) # Tuple methods: 4 appears at offset 3 
3 

>>> T.count(4) # 4 appears once 

1 


The primary distinction for tuples is that they cannot be changed once created. That 
is, they are immutable sequences: 
>>> T[o] = 2 # Tuples are immutable 


...error text omitted... 
TypeError: ‘tuple’ object does not support item assignment 


Like lists and dictionaries, tuples support mixed types and nesting, but they don’t grow 
and shrink because they are immutable: 

>>> T = ('spam', 3.0, [11, 22, 33]) 

>>> T[1] 

3.0 

>>> T[2][1] 

22 

>>> T.append(4) 

AttributeError: 'tuple' object has no attribute ‘append’ 


Why Tuples? 


So, why have a type that is like a list, but supports fewer operations? Frankly, tuples 
are not generally used as often as lists in practice, but their immutability is the whole 
point. If you pass a collection of objects around your program as a list, it can be changed 
anywhere; if you use a tuple, it cannot. That is, tuples provide a sort of integrity con- 
straint that is convenient in programs larger than those we’ll write here. We’ll talk more 
about tuples later in the book. For now, though, let’s jump ahead to our last major core 
type: the file. 


Files 


File objects are Python code’s main interface to external files on your computer. Files 
are a core type, but they’re something of an oddball—there is no specific literal syntax 
for creating them. Rather, to create a file object, you call the built-in open function, 
passing in an external filename and a processing mode as strings. For example, to create 
a text output file, you would pass in its name and the 'w' processing mode string to 
write data: 


>>> f = open('data.txt', 'w') # Make a new file in output mode 

>>> f.write('Hello\n') # Write strings of bytes to it 

6 

>>> f.write('world\n') # Returns number of bytes written in Python 3.0 
6 

>>> f.close() # Close to flush output buffers to disk 
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This creates a file in the current directory and writes text to it (the filename can be a 
full directory path if you need to access a file elsewhere on your computer). To read 
back what you just wrote, reopen the file in 'r' processing mode, for reading text 
input—this is the default if you omit the mode in the call. Then read the file’s content 
into a string, and display it. A file’s contents are always a string in your script, regardless 
of the type of data the file contains: 


>>> f = open('data.txt') # 'r' is the default processing mode 
>>> text = f.read() # Read entire file into a string 

>>> text 

"Hello\nworld\n' 

>>> print (text) # print interprets control characters 
Hello 

world 

>>> text.split() # File content is always a string 


['Hello', 'world'] 


Other file object methods support additional features we don’t have time to cover here. 
For instance, file objects provide more ways of reading and writing (read accepts an 
optional byte size, readline reads one line at a time, and so on), as well as other tools 
(seek moves to a new file position). As we’ll see later, though, the best way to read a 
file today is to not read it at all—tiles provide an iterator that automatically reads line 
by line in for loops and other contexts. 
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3. If the user interactively types the name of a module to test, how can you import it? 


4. How is changing sys.path different from setting PYTHONPATH to modify the module 
search path? 


5. Ifthe module __future__ allows us to import from the future, can we also import 
from the past? 


Test Your Knowledge: Answers 


1. Variables at the top level of a module whose names begin with a single underscore 
are not copied out to the importing scope when the from * statement form is used. 
They can still be accessed by an import or the normal from statement form, though. 


2. If a module’s __name__ variable is the string "__main__", it means that the file is 
being executed as a top-level script instead of being imported from another file in 
the program. That is, the file is being used as a program, not a library. 


3. User input usually comes into a script as a string; to import the referenced module 
given its string name, you can build and run an import statement with exec, or pass 
the string name in a call to the __import__ function. 


4. Changing sys.path only affects one running program, and is temporary—the 
change goes away when the program ends. PYTHONPATH settings live in the operating 
system—they are picked up globally by all programs on a machine, and changes 
to these settings endure after programs exit. 


5. No, we can’t import from the past in Python. We can install (or stubbornly use) 
an older version of the language, but the latest Python is generally the best Python. 


Test Your Knowledge: Part V Exercises 
See “Part V, Modules” on page 1119 in Appendix B for the solutions. 


1. Import basics. Write a program that counts the lines and characters in a file (similar 
in spirit to wc on Unix). With your text editor, code a Python module called 
mymod.py that exports three top-level names: 


e A countLines(name) function that reads an input file and counts the number of 
lines in it (hint: file. readlines does most of the work for you, and len does the 
rest). 


e A countChars(name) function that reads an input file and counts the number of 
characters in it (hint: file. read returns a single string). 


e A test(name) function that calls both counting functions with a given input 
filename. Such a filename generally might be passed in, hardcoded, input with 
the input built-in function, or pulled from a command line via the sys. argv list 
shown in this chapter’s formats.py example; for now, you can assume it’s a 
passed-in function argument. 


Test Your Knowledge: Part V Exercises | 605 


All three mymod functions should expect a filename string to be passed in. If you 
type more than two or three lines per function, you’re working much too hard— 
use the hints I just gave! 


Next, test your module interactively, using import and attribute references to fetch 
your exports. Does your PYTHONPATH need to include the directory where you created 
mymod.py? Try running your module on itself: e.g., test ("mymod.py"). Note that 
test opens the file twice; if you’re feeling ambitious, you may be able to improve 
this by passing an open file object into the two count functions (hint: 
file.seek(0) is a file rewind). 


2. from/from *. Test your mymod module from exercise 1 interactively by using from to 
load the exports directly, first by name, then using the from * variant to fetch 
everything. 

3. __main__. Add a line in your mymod module that calls the test function automati- 
cally only when the module is run as a script, not when it is imported. The line you 
add will probably test the value of __name__ for the string "__main__", as shown in 
this chapter. Try running your module from the system command line; then, im- 
port the module and test its functions interactively. Does it still work in both 
modes? 


4. Nested imports. Write a second module, myclient.py, that imports mymod and tests 
its functions; then run myclient from the system command line. If myclient uses 
from to fetch from mymod, will mymod’s functions be accessible from the top level of 
myclient? What if it imports with import instead? Try coding both variations in 
myclient and test interactively by importing myclient and inspecting its _dict__ 
attribute. 


5. Package imports. Import your file from a package. Create a subdirectory called 
mypkg nested in a directory on your module import search path, move the 
mymod.py module file you created in exercise 1 or 3 into the new directory, and 
try to import it with a package import of the form import mypkg.mymod. 


You'll need to add an __init__.py file in the directory your module was moved to 


make this go, but it should work on all major Python platforms (that’s part of the 
reason Python uses “.” as a path separator). The package directory you create can 
be simply a subdirectory of the one you’re working in; if it is, it will be found via 
the home directory component of the search path, and you won’t have to configure 


your path. Add some code to your __init__.py, and see if it runs on each import. 


6. Reloads. Experiment with module reloads: perform the tests in Chapter 22’s 
changer.py example, changing the called function’s message and/or behavior re- 
peatedly, without stopping the Python interpreter. Depending on your system, you 
might be able to edit changer in another window, or suspend the Python interpreter 
and edit in the same window (on Unix, a Ctrl-Z key combination usually suspends 
the current process, and an fg command later resumes it). 
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7. Circular imports.* In the section on recursive import gotchas, importing recur1 
raised an error. But if you restart Python and import recur2 interactively, the error 
doesn’t occur—test this and see for yourself. Why do you think it works to import 
recur2, but not recur1? (Hint: Python stores new modules in the built-in 
sys.modules table—a dictionary—before running their code; later imports fetch 
the module from this table first, whether the module is “complete” yet or not.) 
Now, try running recur1 as a top-level script file: python recur1.py. Do you get the 
same error that occurs when recur1 is imported interactively? Why? (Hint: when 
modules are run as programs, they aren’t imported, so this case has the same effect 
as importing recur2 interactively; recur2 is the first module imported.) What hap- 
pens when you run recur2 as a script? 


+ Note that circular imports are extremely rare in practice. On the other hand, if you can understand why they 
are a potential problem, you know a lot about Python’s import semantics. 
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PART VI 


Classes and OOP 


CHAPTER 25 


OOP: The Big Picture 


So far in this book, we’ve been using the term “object” generically. Really, the code 
written up to this point has been object-based—we’ ve passed objects around our scripts, 
used them in expressions, called their methods, and so on. For our code to qualify as 
being truly object-oriented (OO), though, our objects will generally need to also par- 
ticipate in something called an inheritance hierarchy. 


This chapter begins our exploration of the Python class—a device used to implement 
new kinds of objects in Python that support inheritance. Classes are Python’s main 
object-oriented programming (OOP) tool, so we’ll also look at OOP basics along the 
way in this part of the book. OOP offers a different and often more effective way of 
looking at programming, in which we factor code to minimize redundancy, and write 
new programs by customizing existing code instead of changing it in-place. 


In Python, classes are created with a new statement: the class statement. As you'll see, 
the objects defined with classes can look a lot like the built-in types we studied earlier 
in the book. In fact, classes really just apply and extend the ideas we’ve already covered; 
roughly, they are packages of functions that use and process built-in object types. 
Classes, though, are designed to create and manage new objects, and they also support 
inheritance—a mechanism of code customization and reuse above and beyond any- 
thing we’ve seen so far. 


One note up front: in Python, OOP is entirely optional, and you don’t need to use 
classes just to get started. In fact, you can get plenty of work done with simpler con- 
structs such as functions, or even simple top-level script code. Because using classes 
well requires some up-front planning, they tend to be of more interest to people who 
work in strategic mode (doing long-term product development) than to people who 
work in tactical mode (where time is in very short supply). 


Still, as you'll see in this part of the book, classes turn out to be one of the most useful 
tools Python provides. When used well, classes can actually cut development time 
radically. They’re also employed in popular Python tools like the tkinter GUI API, so 
most Python programmers will usually find at least a working knowledge of class basics 
helpful. 
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Why Use Classes? 


Remember when I told you that programs “do things with stuff”? In simple terms, 
classes are just a way to define new sorts of stuff, reflecting real objects in a program’s 
domain. For instance, suppose we decide to implement that hypothetical pizza-making 
robot we used as an example in Chapter 16. If we implement it using classes, we can 
model more of its real-world structure and relationships. Two aspects of OOP prove 
useful here: 


Inheritance 
Pizza-making robots are kinds of robots, so they possess the usual robot-y prop- 
erties. In OOP terms, we say they “inherit” properties from the general category 
of all robots. These common properties need to be implemented only once for the 
general case and can be reused by all types of robots we may build in the future. 


Composition 
Pizza-making robots are really collections of components that work together as a 
team. For instance, for our robot to be successful, it might need arms to roll dough, 
motors to maneuver to the oven, and so on. In OOP parlance, our robot is an 
example of composition; it contains other objects that it activates to do its bidding. 
Each component might be coded as a class, which defines its own behavior and 
relationships. 


General OOP ideas like inheritance and composition apply to any application that can 
be decomposed into a set of objects. For example, in typical GUI systems, interfaces 
are written as collections of widgets—buttons, labels, and so on—which are all drawn 
when their container is drawn (composition). Moreover, we may be able to write our 
own custom widgets—buttons with unique fonts, labels with new color schemes, and 
the like—which are specialized versions of more general interface devices (inheritance). 


From a more concrete programming perspective, classes are Python program units, just 
like functions and modules: they are another compartment for packaging logic and 
data. In fact, classes also define new namespaces, much like modules. But, compared 
to other program units we’ve already seen, classes have three critical distinctions that 
make them more useful when it comes to building new objects: 


Multiple instances 
Classes are essentially factories for generating one or more objects. Every time we 
call a class, we generate a new object with a distinct namespace. Each object gen- 
erated from a class has access to the class’s attributes and gets a namespace of its 
own for data that varies per object. 


Customization via inheritance 
Classes also support the OOP notion of inheritance; we can extend a class by re- 
defining its attributes outside the class itself. More generally, classes can build up 
namespace hierarchies, which define names to be used by objects created from 
classes in the hierarchy. 
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Operator overloading 
By providing special protocol methods, classes can define objects that respond to 
the sorts of operations we saw at work on built-in types. For instance, objects made 
with classes can be sliced, concatenated, indexed, and so on. Python provides 
hooks that classes can use to intercept and implement any built-in type operation. 


OOP from 30,000 Feet 


Before we see what this all means in terms of code, Pd like to say a few words about 
the general ideas behind OOP. If you’ve never done anything object-oriented in your 
life before now, some of the terminology in this chapter may seem a bit perplexing on 
the first pass. Moreover, the motivation for these terms may be elusive until you’ve had 
a chance to study the ways that programmers apply them in larger systems. OOP is as 
much an experience as a technology. 


Attribute Inheritance Search 


The good news is that OOP is much simpler to understand and use in Python than in 
other languages, such as C++ or Java. As a dynamically typed scripting language, Py- 
thon removes much of the syntactic clutter and complexity that clouds OOP in other 
tools. In fact, most of the OOP story in Python boils down to this expression: 


object attribute 


We’ve been using this expression throughout the book to access module attributes, call 
methods of objects, and so on. When we say this to an object that is derived from a 
class statement, however, the expression kicks off a search in Python—it searches a 
tree of linked objects, looking for the first appearance of attribute that it can find. 
When classes are involved, the preceding Python expression effectively translates to 
the following in natural language: 


Find the first occurrence of attribute by looking in object, then in all classes above it, 
from bottom to top and left to right. 


In other words, attribute fetches are simply tree searches. The term inheritance is ap- 
plied because objects lower ina tree inherit attributes attached to objects higher in that 
tree. As the search proceeds from the bottom up, in a sense, the objects linked into a 
tree are the union of all the attributes defined in all their tree parents, all the way up 
the tree. 


In Python, this is all very literal: we really do build up trees of linked objects with code, 
and Python really does climb this tree at runtime searching for attributes every time we 
use the object .attribute expression. To make this more concrete, Figure 25-1 sketches 
an example of one of these trees. 


In this figure, there is a tree of five objects labeled with variables, all of which have 
attached attributes, ready to be searched. More specifically, this tree links together three 
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Figure 25-1. A class tree, with two instances at the bottom (I1 and 12), a class above them (C1), and 
two superclasses at the top (C2 and C3). All of these objects are namespaces (packages of variables), 
and the inheritance search is simply a search of the tree from bottom to top looking for the lowest 
occurrence of an attribute name. Code implies the shape of such trees. 


class objects (the ovals C1, C2, and C3) and two instance objects (the rectangles I1 and 
I2) into an inheritance search tree. Notice that in the Python object model, classes and 
the instances you generate from them are two distinct object types: 


Classes 
Serve as instance factories. Their attributes provide behavior—data and 
functions—that is inherited by all the instances generated from them (e.g., a func- 
tion to compute an employee’s salary from pay and hours). 


Instances 
Represent the concrete items in a program’s domain. Their attributes record data 
that varies per specific object (e.g., an employee’s Social Security number). 


In terms of search trees, an instance inherits attributes from its class, and a class inherits 
attributes from all classes above it in the tree. 


In Figure 25-1, we can further categorize the ovals by their relative positions in the tree. 
We usually call classes higher in the tree (like C2 and C3) superclasses; classes lower in 
the tree (like C1) are known as subclasses.’ These terms refer to relative tree positions 
and roles. Superclasses provide behavior shared by all their subclasses, but because the 
search proceeds from the bottom up, subclasses may override behavior defined in their 
superclasses by redefining superclass names lower in the tree. 


As these last few words are really the crux of the matter of software customization in 
OOP, let’s expand on this concept. Suppose we build up the tree in Figure 25-1, and 
then say this: 


I2.w 


* In other literature, you may also occasionally see the terms base classes and derived classes used to describe 
superclasses and subclasses, respectively. 
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Right away, this code invokes inheritance. Because this is an object.attribute expres- 
sion, it triggers a search of the tree in Figure 25-1—Python will search for the attribute 
w by looking in 12 and above. Specifically, it will search the linked objects in this order: 


I2, C1, C2, C3 


and stop at the first attached w it finds (or raise an error if w isn’t found at all). In this 
case, w won't be found until C3 is searched because it appears only in that object. In 
other words, 12.w resolves to C3.w by virtue of the automatic search. In OOP termi- 
nology, I2 “inherits” the attribute w from C3. 


Ultimately, the two instances inherit four attributes from their classes: w, x, y, and z. 
Other attribute references will wind up following different paths in the tree. For 
example: 


e I1.x and 12.x both find x in C1 and stop because C1 is lower than C2. 
e 11.y and 12.y both find y in C1 because that’s the only place y appears. 
e I1.z and 12.z both find z in C2 because C2 is further to the left than C3. 


e I2.name finds name in I2 without climbing the tree at all. 


Trace these searches through the tree in Figure 25-1 to get a feel for how inheritance 
searches work in Python. 


The first item in the preceding list is perhaps the most important to notice—because 
C1 redefines the attribute x lower in the tree, it effectively replaces the version above it 
in C2. As you’ll see in a moment, such redefinitions are at the heart of software cus- 
tomization in OOP—by redefining and replacing the attribute, C1 effectively customizes 
what it inherits from its superclasses. 


Classes and Instances 


Although they are technically two separate object types in the Python model, the classes 
and instances we put in these trees are almost identical—each type’s main purpose is 
to serve as another kind of namespace—a package of variables, and a place where we 
can attach attributes. If classes and instances therefore sound like modules, they should; 
however, the objects in class trees also have automatically searched links to other 
namespace objects, and classes correspond to statements, not entire files. 


The primary difference between classes and instances is that classes are a kind of fac- 
tory for generating instances. For example, in a realistic application, we might have an 
Employee class that defines what it means to be an employee; from that class, we generate 
actual Employee instances. This is another difference between classes and modules: we 
only ever have one instance of a given module in memory (that’s why we have to reload 
a module to get its new code), but with classes, we can make as many instances as we 
need. 
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Operationally, classes will usually have functions attached to them (e.g., 
computeSalary), and the instances will have more basic data items used by the class’ 
functions (e.g., hoursWorked). In fact, the object-oriented model is not that different 
from the classic data-processing model of programs plus records; in OOP, instances are 
like records with “data,” and classes are the “programs” for processing those records. 
In OOP, though, we also have the notion of an inheritance hierarchy, which supports 
software customization better than earlier models. 


Class Method Calls 


In the prior section, we saw how the attribute reference 12.w in our example class tree 
was translated to C3.w by the inheritance search procedure in Python. Perhaps just as 
important to understand as the inheritance of attributes, though, is what happens when 
we try to call methods (i.e., functions attached to classes as attributes). 


If this I2.w reference is a function call, what it really means is “call the C3.w function to 
process 12.” That is, Python will automatically map the call I2.w() into the call 
C3.w(I2), passing in the instance as the first argument to the inherited function. 


In fact, whenever we call a function attached to a class in this fashion, an instance of 
the class is always implied. This implied subject or context is part of the reason we refer 
to this as an object-oriented model—there is always a subject object when an operation 
is run. Ina more realistic example, we might invoke a method called giveRaise attached 
as an attribute to an Employee class; such a call has no meaning unless qualified with 
the employee to whom the raise should be given. 


As we’ll see later, Python passes in the implied instance to a special first argument 
in the method, called self by convention. As we'll also learn, methods can be 
called through either an instance (e.g., bob.giveRaise()) or a class (e.g., 
Employee.giveRaise(bob)), and both forms serve purposes in our scripts. To see how 
methods receive their subjects, though, we need to move on to some code. 


Coding Class Trees 


Although we are speaking in the abstract here, there is tangible code behind all these 
ideas. We construct trees, and their objects with class statements and class calls, which 
we'll meet in more detail later. In short: 

e Each class statement generates a new class object. 

e Each time a class is called, it generates a new instance object. 

e Instances are automatically linked to the classes from which they are created. 


* Classes are linked to their superclasses by listing them in parentheses in a class 
header line; the left-to-right order there gives the order in the tree. 
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To build the tree in Figure 25-1, for example, we would run Python code of this form 
(Tve omitted the guts of the class statements here): 


class C2: ... # Make class objects (ovals) 

class C3: ... 

class C1(C2, C3): ... # Linked to superclasses 

I1 = C1() # Make instance objects (rectangles) 
I2 = €1() # Linked to their classes 


Here, we build the three class objects by running three class statements, and make the 
two instance objects by calling the class C1 twice, as though it were a function. The 
instances remember the class they were made from, and the class C1 remembers its listed 
superclasses. 


Technically, this example is using something called multiple inheritance, which simply 
means that a class has more than one superclass above it in the class tree. In Python, if 
there is more than one superclass listed in parentheses in a class statement (like C1’s 
here), their left-to-right order gives the order in which those superclasses will be 
searched for attributes. 


Because of the way inheritance searches proceed, the object to which you attach an 
attribute turns out to be crucial—it determines the name’s scope. Attributes attached 
to instances pertain only to those single instances, but attributes attached to classes are 
shared by all their subclasses and instances. Later, we’ll study the code that hangs 
attributes on these objects in depth. As we’ll find: 


e Attributes are usually attached to classes by assignments made within class state- 
ments, and not nested inside function def statements. 


e Attributes are usually attached to instances by assignments to a special argument 
passed to functions inside classes, called self. 


For example, classes provide behavior for their instances with functions created by 
coding def statements inside class statements. Because such nested defs assign names 
within the class, they wind up attaching attributes to the class object that will be in- 
herited by all instances and subclasses: 


class C1(C2, C3): # Make and link class C1 
def setname(self, who): # Assign name: C1.setname 

self.name = who # Self is either 11 or I2 

I1 = C1() # Make two instances 

I2 = €1() 

I1.setname('bob') # Sets I1.name to 'bob' 

I2.setname('mel') # Sets I2.name to 'mel' 

print (I1.name) # Prints 'bob' 
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There’s nothing syntactically unique about def in this context. Operationally, when a 
def appears inside a class like this, it is usually known as a method, and it automatically 
receives a special first argument—called self by convention—that provides a handle 
back to the instance to be processed.t 


Because classes are factories for multiple instances, their methods usually go through 
this automatically passed-in self argument whenever they need to fetch or set attributes 
of the particular instance being processed by a method call. In the preceding code, 
self is used to store a name in one of two instances. 


Like simple variables, attributes of classes and instances are not declared ahead of time, 
but spring into existence the first time they are assigned values. When a method assigns 
to a self attribute, it creates or changes an attribute in an instance at the bottom of the 
class tree (i.e., one of the rectangles) because self automatically refers to the instance 
being processed. 


In fact, because all the objects in class trees are just namespace objects, we can fetch or 
set any of their attributes by going through the appropriate names. Saying C1.setname 
is as valid as saying 11. setname, as long as the names C1 and I1 are in your code’s scopes. 


As currently coded, our C1 class doesn’t attach a name attribute to an instance until the 
setname method is called. In fact, referencing I1.name before calling I1.setname would 
produce an undefined name error. If a class wants to guarantee that an attribute like 
name is always set in its instances, it more typically will fill out the attribute at con- 
struction time, like this: 


class €1(C2, C3): 


def init__(self, who): # Set name when constructed 
self.name = who # Self is either 11 or I2 
I1 = C1(‘bob') # Sets I1.name to 'bob' 
I2 = C1(‘mel') # Sets I2.name to 'mel' 
print(I1.name) # Prints 'bob' 


If it’s coded and inherited, Python automatically calls a method named __init__ each 
time an instance is generated from a class. The new instance is passed in to the self 
argument of _init__as usual, and any values listed in parentheses in the class call go 
to arguments two and beyond. The effect here is to initialize instances when they are 
made, without requiring extra method calls. 


The _init__ method is known as the constructor because of when it is run. It’s the 
most commonly used representative of a larger class of methods called operator over- 
loading methods, which we’ll discuss in more detail in the chapters that follow. Such 
methods are inherited in class trees as usual and have double underscores at the start 
and end of their names to make them distinct. Python runs them automatically when 
instances that support them appear in the corresponding operations, and they are 


t If you’ve ever used C++ or Java, you'll recognize that Python’s self is the same as the this pointer, but 
self is always explicit in Python to make attribute accesses more obvious. 
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mostly an alternative to using simple method calls. They’re also optional: if omitted, 
the operations are not supported. 


For example, to implement set intersection, a class might either provide a method 
named intersect, or overload the & expression operator to dispatch to the required 
logic by coding a method named __and_. Because the operator scheme makes instances 
look and feel more like built-in types, it allows some classes to provide a consistent and 
natural interface, and be compatible with code that expects a built-in type. 


OOP Is About Code Reuse 


And that, along with a few syntax details, is most of the OOP story in Python. Of course, 
there’s a bit more to it than just inheritance. For example, operator overloading is much 
more general than I’ve described so far—classes may also provide their own imple- 
mentations of operations such as indexing, fetching attributes, printing, and more. By 
and large, though, OOP is about looking up attributes in trees. 


So why would we be interested in building and searching trees of objects? Although it 
takes some experience to see how, when used well, classes support code reuse in ways 
that other Python program components cannot. With classes, we code by customizing 
existing software, instead of either changing existing code in-place or starting from 
scratch for each new project. 


At a fundamental level, classes are really just packages of functions and other names, 
much like modules. However, the automatic attribute inheritance search that we get 
with classes supports customization of software above and beyond what we can do 
with modules and functions. Moreover, classes provide a natural structure for code 
that localizes logic and names, and so aids in debugging. 


For instance, because methods are simply functions with a special first argument, we 
can mimic some of their behavior by manually passing objects to be processed to simple 
functions. The participation of methods in class inheritance, though, allows us to nat- 
urally customize existing software by coding subclasses with new method definitions, 
rather than changing existing code in-place. There is really no such concept with mod- 
ules and functions. 


Asan example, suppose you ’re assigned the task of implementing an employee database 
application. As a Python OOP programmer, you might begin by coding a general su- 
perclass that defines default behavior common to all the kinds of employees in your 
organization: 
class Employee: # General superclass 

def computeSalary(self): ... # Common or default behavior 

def giveRaise(self): ... 

def promote(self): ... 

def retire(self): ... 
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Once you’ve coded this general behavior, you can specialize it for each specific kind of 
employee to reflect how the various types differ from the norm. That is, you can code 
subclasses that customize just the bits of behavior that differ per employee type; the 
rest of the employee types’ behavior will be inherited from the more general class. For 
example, if engineers have a unique salary computation rule (i.e., not hours times rate), 
you can replace just that one method in a subclass: 


class Engineer(Employee) : # Specialized subclass 
def computeSalary(self): ... # Something custom here 


Because the computeSalary version here appears lower in the class tree, it will replace 
(override) the general version in Employee. You then create instances of the kinds of 
employee classes that the real employees belong to, to get the correct behavior: 


bob = Employee() # Default behavior 
mel = Engineer() # Custom salary calculator 


iT 


Notice that you can make instances of any class in a tree, not just the ones at the 
bottom—the class you make an instance from determines the level at which the at- 
tribute search will begin. Ultimately, these two instance objects might wind up em- 
bedded in a larger container object (e.g., a list, or an instance of another class) that 
represents a department or company using the composition idea mentioned at the start 
of this chapter. 


When you later ask for these employees’ salaries, they will be computed according to 
the classes from which the objects were made, due to the principles of the inheritance 
search:+ 


company = [bob, mel] # A composite object 
for emp in company: 
print (emp.computeSalary()) # Run this object's version 


This is yet another instance of the idea of polymorphism introduced in Chapter 4 and 
revisited in Chapter 16. Recall that polymorphism means that the meaning of an op- 
eration depends on the object being operated on. Here, the method computeSalary is 
located by inheritance search in each object before it is called. In other applications, 
polymorphism might also be used to hide (i.e., encapsulate) interface differences. For 
example, a program that processes data streams might be coded to expect objects with 
input and output methods, without caring what those methods actually do: 
def processor(reader, converter, writer): 
while 1: 


data = reader.read() 
if not data: break 


+ Note that the company list in this example could be stored in a file with Python object pickling, introduced in 
Chapter 9 when we met files, to yield a persistent employee database. Python also comes with a module 
named shelve, which would allow you to store the pickled representation of the class instances in an access- 
by-key filesystem; the third-party open source ZODB system does the same but has better support for 
production-quality object-oriented databases. 
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data = converter(data) 
writer.write(data) 


By passing in instances of subclasses that specialize the required read and write method 
interfaces for various data sources, we can reuse the processor function for any data 
source we need to use, both now and in the future: 
class Reader: 
def read(self): ... # Default behavior and tools 


def other(self): ... 
class FileReader (Reader): 


def read(self): ... # Read from a local file 
class SocketReader (Reader): 
def read(self): ... # Read from a network socket 


processor(FileReader(...), Converter, FileWriter(...)) 
processor(SocketReader(...), Converter, TapeWriter(...)) 
processor(FtpReader(...), Converter, XmlWriter(...)) 


Moreover, because the internal implementations of those read and write methods have 
been factored into single locations, they can be changed without impacting code such 
as this that uses them. In fact, the processor function might itself be a class to allow 
the conversion logic of converter to be filled in by inheritance, and to allow readers 


and writers to be embedded by composition (we’ll see how this works later in this part 
of the book). 


Once you get used to programming this way (by software customization), you’ll find 
that when it’s time to write a new program, much of your work may already be done— 
your task largely becomes one of mixing together existing superclasses that already 
implement the behavior required by your program. For example, someone else might 
have written the Employee, Reader, and Writer classes in this example for use in a com- 
pletely different program. If so, you get all of that person’s code “for free.” 


In fact, in many application domains, you can fetch or purchase collections of super- 
classes, known as frameworks, that implement common programming tasks as classes, 
ready to be mixed into your applications. These frameworks might provide database 
interfaces, testing protocols, GUI toolkits, and so on. With frameworks, you often 
simply code a subclass that fills in an expected method or two; the framework classes 
higher in the tree do most of the work for you. Programming in such an OOP world is 
just a matter of combining and specializing already debugged code by writing subclasses 
of your own. 


Of course, it takes a while to learn how to leverage classes to achieve such OOP utopia. 
In practice, object-oriented work also entails substantial design work to fully realize 
the code reuse benefits of classes—to this end, programmers have begun cataloging 
common OOP structures, known as design patterns, to help with design issues. The 
actual code you write to do OOP in Python, though, is so simple that it will not in itself 
pose an additional obstacle to your OOP quest. To see why, you'll have to move on to 
Chapter 26. 
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Chapter Summary 


We took an abstract look at classes and OOP in this chapter, taking in the big picture 
before we dive into syntax details. As we’ve seen, OOP is mostly about looking up 
attributes in trees of linked objects; we call this lookup an inheritance search. Objects 
at the bottom of the tree inherit attributes from objects higher up in the tree—a feature 
that enables us to program by customizing code, rather than changing it, or starting 
from scratch. When used well, this model of programming can cut development time 
radically. 


The next chapter will begin to fill in the coding details behind the picture painted here. 
As we get deeper into Python classes, though, keep in mind that the OOP model in 
Python is very simple; as I’ve already stated, it’s really just about looking up attributes 
in object trees. Before we move on, here’s a quick quiz to review what we’ve covered 
here. 


Test Your Knowledge: Quiz 
. What is the main point of OOP in Python? 


. Where does an inheritance search look for an attribute? 

. What is the difference between a class object and an instance object? 
. Why is the first argument in a class method function special? 

. What is the _init_ method used for? 

. How do you create a class instance? 


. How do you create a class? 
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. How do you specify a class’s superclasses? 


Test Your Knowledge: Answers 


1. OOP is about code reuse—you factor code to minimize redundancy and program 
by customizing what already exists instead of changing code in-place or starting 
from scratch. 


2. An inheritance search looks for an attribute first in the instance object, then in the 
class the instance was created from, then in all higher superclasses, progressing 
from the bottom to the top of the object tree, and from left to right (by default). 
The search stops at the first place the attribute is found. Because the lowest version 
of a name found along the way wins, class hierarchies naturally support customi- 
zation by extension. 


622 | Chapter 25: OOP: The Big Picture 


. Both class and instance objects are namespaces (packages of variables that appear 
as attributes). The main difference between them is that classes are a kind of factory 
for creating multiple instances. Classes also support operator overloading meth- 
ods, which instances inherit, and treat any functions nested within them as special 
methods for processing instances. 


. The first argument in a class method function is special because it always receives 
the instance object that is the implied subject of the method call. It’s usually called 
self by convention. Because method functions always have this implied subject 
object context by default, we say they are “object-oriented”—1.e., designed to 
process or change objects. 


. Ifthe _init__ method is coded or inherited in a class, Python calls it automatically 
each time an instance of that class is created. It’s known as the constructor method; 
it is passed the new instance implicitly, as well as any arguments passed explicitly 
to the class name. It’s also the most commonly used operator overloading method. 
Ifno__init__ method is present, instances simply begin life as empty namespaces. 


. You create a class instance by calling the class name as though it were a function; 
any arguments passed into the class name show up as arguments two and beyond 
inthe _init__ constructor method. The new instance remembers the class it was 
created from for inheritance purposes. 


. You create a class by running a class statement; like function definitions, these 
statements normally run when the enclosing module file is imported (more on this 
in the next chapter). 


. You specify a class’s superclasses by listing them in parentheses in the class state- 
ment, after the new class’s name. The left-to-right order in which the classes are 
listed in the parentheses gives the left-to-right inheritance search order in the class 
tree. 
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CHAPTER 26 
Class Coding Basics 


Now that we’ve talked about OOP in the abstract, it’s time to see how this translates 
to actual code. This chapter begins to fill in the syntax details behind the class model 
in Python. 


If you’ve never been exposed to OOP in the past, classes can seem somewhat compli- 
cated if taken in a single dose. To make class coding easier to absorb, we’ll begin our 
detailed exploration of OOP by taking a first look at some basic classes in action in this 
chapter. We’ll expand on the details introduced here in later chapters of this part of 
the book, but in their basic form, Python classes are easy to understand. 


In fact, classes have just three primary distinctions. At a base level, they are mostly just 
namespaces, much like the modules we studied in Part V. Unlike modules, though, 
classes also have support for generating multiple objects, for namespace inheritance, 
and for operator overloading. Let’s begin our class statement tour by exploring each 
of these three distinctions in turn. 


Classes Generate Multiple Instance Objects 


To understand how the multiple objects idea works, you have to first understand that 
there are two kinds of objects in Python’s OOP model: class objects and instance ob- 
jects. Class objects provide default behavior and serve as factories for instance objects. 
Instance objects are the real objects your programs process—each is a namespace in 
its own right, but inherits (i.e., has automatic access to) names in the class from which 
it was created. Class objects come from statements, and instances come from calls; each 
time you call a class, you get a new instance of that class. 
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This object-generation concept is very different from any of the other program con- 
structs we’ve seen so far in this book. In effect, classes are essentially factories for gen- 
erating multiple instances. By contrast, only one copy of each module is ever imported 
into a single program (in fact, one reason that we have to call imp.reload is to update 
the single module object so that changes are reflected once they’ve been made). 


The following is a quick summary of the bare essentials of Python OOP. As you'll see, 
Python classes are in some ways similar to both defs and modules, but they may be 
quite different from what you’re used to in other languages. 


Class Objects Provide Default Behavior 


When we run a class statement, we get a class object. Here’s a rundown of the main 
properties of Python classes: 


° The class statement creates a class object and assigns it a name. Just like the 
function def statement, the Python class statement is an executable statement. 
When reached and run, it generates a new class object and assigns it to the name 
in the class header. Also, like defs, class statements typically run when the files 
they are coded in are first imported. 


° Assignments inside class statements make class attributes. Just like in module 
files, top-level assignments within a class statement (not nested in a def) generate 
attributes in a class object. Technically, the class statement scope morphs into the 
attribute namespace of the class object, just like a module’s global scope. After 
running a class statement, class attributes are accessed by name qualification: 
object .name. 


e Class attributes provide object state and behavior. Attributes of a class object 
record state information and behavior to be shared by all instances created from 
the class; function def statements nested inside a class generate methods, which 
process instances. 


Instance Objects Are Concrete Items 


When we call a class object, we get an instance object. Here’s an overview of the key 
points behind class instances: 


e Calling a class object like a function makes a new instance object. Each time 
a class is called, it creates and returns a new instance object. Instances represent 
concrete items in your program’s domain. 


e Each instance object inherits class attributes and gets its own namespace. 
Instance objects created from classes are new namespaces; they start out empty 
but inherit attributes that live in the class objects from which they were generated. 
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°- Assignments to attributes of self in methods make per-instance attributes. 
Inside class method functions, the first argument (called self by convention) ref- 
erences the instance object being processed; assignments to attributes of self create 
or change data in the instance, not the class. 


A First Example 


Let’s turn to a real example to show how these ideas work in practice. To begin, let’s 
define a class named FirstClass by running a Python class statement interactively: 


>>> class FirstClass: # Define a class object 
def setdata(self, value): # Define class methods 
self.data = value # self is the instance 
def display(self): 
print(self.data) # self.data: per instance 


We're working interactively here, but typically, such a statement would be run when 
the module file it is coded in is imported. Like functions created with defs, this class 
wont even exist until Python reaches and runs this statement. 


Like all compound statements, the class starts with a header line that lists the class 
name, followed by a body of one or more nested and (usually) indented statements. 
Here, the nested statements are defs; they define functions that implement the behavior 
the class means to export. 


As we learned in Part IV, def is really an assignment. Here, it assigns function objects 
to the names setdata and display in the class statement’s scope, and so generates 
attributes attached to the class: FirstClass.setdata and FirstClass.display. In fact, 
any name assigned at the top level of the class’s nested block becomes an attribute of 
the class. 


Functions inside a class are usually called methods. They’re coded with normal defs, 
and they support everything we’ve learned about functions already (they can have de- 
faults, return values, and so on). But in a method function, the first argument auto- 
matically receives an implied instance object when called—the subject of the call. We 
need to create a couple of instances to see how this works: 


>>> x = FirstClass() # Make two instances 
>>> y = FirstClass() # Each is a new namespace 


By calling the class this way (notice the parentheses), we generate instance objects, 
which are just namespaces that have access to their classes’ attributes. Properly speak- 
ing, at this point, we have three objects: two instances and a class. Really, we have three 
linked namespaces, as sketched in Figure 26-1. In OOP terms, we say that x “is a” 
FirstClass, as is y. 
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FirstClass 


- setdata 
- display 


Figure 26-1. Classes and instances are linked namespace objects in a class tree that is searched by 
inheritance. Here, the “data” attribute is found in instances, but “setdata” and “display” are in the 
class above them. 


The two instances start out empty but have links back to the class from which they 
were generated. If we qualify an instance with the name of an attribute that lives in the 
class object, Python fetches the name from the class by inheritance search (unless it 
also lives in the instance): 


>>> x.setdata("King Arthur") # Call methods: self is x 
>>> y.setdata(3.14159) # Runs: FirstClass.setdata(y, 3.14159) 


Neither x nor y has a setdata attribute of its own, so to find it, Python follows the link 
from instance to class. And that’s about all there is to inheritance in Python: it happens 
at attribute qualification time, and it just involves looking up names in linked objects 
(e.g., by following the is-a links in Figure 26-1). 


In the setdata function inside FirstClass, the value passed in is assigned to 
self.data. Within a method, self—the name given to the leftmost argument by con- 
vention—automatically refers to the instance being processed (x or y), so the assign- 
ments store values in the instances’ namespaces, not the class’s (that’s how the data 
names in Figure 26-1 are created). 


Because classes can generate multiple instances, methods must go through the self 
argument to get to the instance to be processed. When we call the class’s display 
method to print self.data, we see that it’s different in each instance; on the other hand, 
the name display itself is the same in x and y, as it comes (is inherited) from the class: 

>>> x.display() # self.data differs in each instance 

King Arthur 

>>> y.display() 

3.14159 
Notice that we stored different object types in the data member in each instance (a 
string, and a floating point). As with everything else in Python, there are no declarations 
for instance attributes (sometimes called members); they spring into existence the first 
time they are assigned values, just like simple variables. In fact, if we were to call 
display on one of our instances before calling setdata, we would trigger an undefined 
name error—the attribute named data doesn’t even exist in memory until it is assigned 
within the setdata method. 
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As another way to appreciate how dynamic this model is, consider that we can change 
instance attributes in the class itself, by assigning to self in methods, or outside the 
class, by assigning to an explicit instance object: 


>>> x.data = "New value" # Can get/set attributes 
>>> x.display() # Outside the class too 
New value 


Although less common, we could even generate a brand new attribute in the instance’s 
namespace by assigning to its name outside the class’s method functions: 


>>> X.anothername = "spam" # Can set new attributes here too! 


This would attach a new attribute called anothername, which may or may not be used 
by any of the class’s methods, to the instance object x. Classes usually create all of the 
instance’s attributes by assignment to the self argument, but they don’t have to; pro- 
grams can fetch, change, or create attributes on any objects to which they have 
references. 


Classes Are Customized by Inheritance 


Besides serving as factories for generating multiple instance objects, classes also allow 
us to make changes by introducing new components (called subclasses), instead of 
changing existing components in-place. Instance objects generated from a class inherit 
the class’s attributes. Python also allows classes to inherit from other classes, opening 
the door to coding hierarchies of classes that specialize behavior—by redefining attrib- 
utes in subclasses that appear lower in the hierarchy, we override the more general 
definitions of those attributes higher in the tree. In effect, the further down the hierarchy 
we go, the more specific the software becomes. Here, too, there is no parallel with 
modules: their attributes live in a single, flat namespace that is not as amenable to 
customization. 


In Python, instances inherit from classes, and classes inherit from superclasses. Here 
are the key ideas behind the machinery of attribute inheritance: 


° Superclasses are listed in parentheses in a class header. To inherit attributes 
from another class, just list the class in parentheses in a class statement’s header. 
The class that inherits is usually called a subclass, and the class that is inherited 
from is its superclass. 


e Classes inherit attributes from their superclasses. Just as instances inherit the 
attribute names defined in their classes, classes inherit all the attribute names de- 
fined in their superclasses; Python finds them automatically when they’ re accessed, 
if they don’t exist in the subclasses. 


e Instances inherit attributes from all accessible classes. Each instance gets 
names from the class it’s generated from, as well as all of that class’s superclasses. 
When looking for a name, Python checks the instance, then its class, then all 
superclasses. 
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e Each object attribute reference invokes a new, independent search. Python 
performs an independent search of the class tree for each attribute fetch expression. 
This includes references to instances and classes made outside class statements 
(e.g., X.attr), as well as references to attributes of the self instance argument in 
class method functions. Each self.attr expression in a method invokes a new 
search for attr in self and above. 


e Logic changes are made by subclassing, not by changing superclasses. By 
redefining superclass names in subclasses lower in the hierarchy (class tree), sub- 
classes replace and thus customize inherited behavior. 


The net effect, and the main purpose of all this searching, is that classes support fac- 
toring and customization of code better than any other language tool we’ve seen so far. 
On the one hand, they allow us to minimize code redundancy (and so reduce mainte- 
nance costs) by factoring operations into a single, shared implementation; on the other, 
they allow us to program by customizing what already exists, rather than changing it 
in-place or starting from scratch. 


A Second Example 


To illustrate the role of inheritance, this next example builds on the previous one. First, 
we'll define a new class, SecondClass, that inherits all of FirstClass’s names and pro- 
vides one of its own: 

>>> class SecondClass(FirstClass): # Inherits setdata 


def display(self): # Changes display 
print('Current value = "%s"' % self.data) 


SecondClass defines the display method to print with a different format. By defining 
an attribute with the same name as an attribute in FirstClass, SecondClass effectively 
replaces the display attribute in its superclass. 


Recall that inheritance searches proceed upward from instances, to subclasses, to su- 
perclasses, stopping at the first appearance of the attribute name that it finds. In this 
case, since the display name in SecondClass will be found before the one in First 
Class, we say that SecondClass overrides FirstClass’s display. Sometimes we call this 
act of replacing attributes by redefining them lower in the tree overloading. 


The net effect here is that SecondClass specializes FirstClass by changing the behavior 
of the display method. On the other hand, SecondClass (and any instances created from 
it) still inherits the setdata method in FirstClass verbatim. Let’s make an instance to 
demonstrate: 

>>> z = SecondClass() 

>>> z.setdata(42) # Finds setdata in FirstClass 


>>> z.display() # Finds overridden method in SecondClass 
Current value = "42" 
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As before, we make a SecondClass instance object by calling it. The setdata call still 
runs the version in FirstClass, but this time the display attribute comes from Second 
Class and prints a custom message. Figure 26-2 sketches the namespaces involved. 


FirstClass 


- setdata 
SecondClass - display 


- display 
Z (instance) 


- data 3 iänsesssssasag ei SN 
i i Zsetdata i 


Figure 26-2. Specialization by overriding inherited names by redefining them in extensions lower in 
the class tree. Here, SecondClass redefines and so customizes the “display” method for its instances. 


Now, here’s a very important thing to notice about OOP: the specialization introduced 
in SecondClass is completely external to FirstClass. That is, it doesn’t affect existing 
or future FirstClass objects, like the x from the prior example: 


>>> x.display() # x is still a FirstClass instance (old message) 
New value 


Rather than changing FirstClass, we customized it. Naturally, this is an artificial ex- 
ample, but as a rule, because inheritance allows us to make changes like this in external 
components (i.e., in subclasses), classes often support extension and reuse better than 
functions or modules can. 


Classes Are Attributes in Modules 


Before we move on, remember that there’s nothing magic about a class name. It’s just 
a variable assigned to an object when the class statement runs, and the object can be 
referenced with any normal expression. For instance, if our FirstClass was coded ina 
module file instead of being typed interactively, we could import it and use its name 
normally in a class header line: 

from modulename import FirstClass # Copy name into my scope 


class SecondClass(FirstClass): # Use class name directly 
def display(self): ... 


Or, equivalently: 


import modulename # Access the whole module 
class SecondClass(modulename.FirstClass): # Qualify to reference 
def display(self): ... 
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Like everything else, class names always live within a module, so they must follow all 
the rules we studied in Part V. For example, more than one class can be coded in a 
single module file—like other statements in a module, class statements are run during 
imports to define names, and these names become distinct module attributes. More 
generally, each module may arbitrarily mix any number of variables, functions, and 
classes, and all names in a module behave the same way. The file food. py demonstrates: 


# food.py 

var = 1 # food.var 

def func(): # food.func 
class spam: # food.spam 
class ham: # food.ham 
class eggs: # food.eggs 


This holds true even if the module and class happen to have the same name. For ex- 
ample, given the following file, person.py: 


class person: 


we need to go through the module to fetch the class as usual: 


import person # Import module 
x = person.person() # Class within module 


Although this path may look redundant, it’s required: person.person refers to the 
person class inside the person module. Saying just person gets the module, not the class, 
unless the from statement is used: 


from person import person # Get class from module 
x = person() # Use class name 


As with any other variable, we can never see a class in a file without first importing and 
somehow fetching it from its enclosing file. If this seems confusing, don’t use the same 
name for a module and a class within it. In fact, common convention in Python dictates 
that class names should begin with an uppercase letter, to help make them more 
distinct: 


import person # Lowercase for modules 
X = person.Person() # Uppercase for classes 


Also, keep in mind that although classes and modules are both namespaces for attach- 
ing attributes, they correspond to very different source code structures: a module re- 
flects an entire file, but a class is a statement within a file. We’ll say more about such 
distinctions later in this part of the book. 
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Classes Can Intercept Python Operators 


Let’s move on to the third major difference between classes and modules: operator 
overloading. In simple terms, operator overloading lets objects coded with classes in- 
tercept and respond to operations that work on built-in types: addition, slicing, print- 
ing, qualification, and so on. It’s mostly just an automatic dispatch mechanism— 
expressions and other built-in operations route control to implementations in classes. 
Here, too, there is nothing similar in modules: modules can implement function calls, 
but not the behavior of expressions. 


Although we could implement all class behavior as method functions, operator over- 
loading lets objects be more tightly integrated with Python’s object model. Moreover, 
because operator overloading makes our own objects act like built-ins, it tends to foster 
object interfaces that are more consistent and easier to learn, and it allows class-based 
objects to be processed by code written to expect a built-in type’s interface. Here is a 
quick rundown of the main ideas behind overloading operators: 


e Methods named with double underscores (__X__) are special hooks. Python 
operator overloading is implemented by providing specially named methods to 
intercept operations. The Python language defines a fixed and unchangeable map- 
ping from each of these operations to a specially named method. 


* Such methods are called automatically when instances appear in built-in 
operations. For instance, if an instance object inherits an _add_ method, that 
method is called whenever the object appears in a + expression. The method’s 
return value becomes the result of the corresponding expression. 


e Classes may override most built-in type operations. There are dozens of special 
operator overloading method names for intercepting and implementing nearly ev- 
ery operation available for built-in types. This includes expressions, but also basic 
operations like printing and object creation. 


e There are no defaults for operator overloading methods, and none are 
required. If a class does not define or inherit an operator overloading method, it 
just means that the corresponding operation is not supported for the class’s in- 
stances. If there is no _add__, for example, + expressions raise exceptions. 


e Operators allow classes to integrate with Python’s object model. By over- 
loading type operations, user-defined objects implemented with classes can act just 
like built-ins, and so provide consistency as well as compatibility with expected 
interfaces. 


Operator overloading is an optional feature; it’s used primarily by people developing 
tools for other Python programmers, not by application developers. And, candidly, you 
probably shouldn’t try to use it just because it seems “cool.” Unless a class needs to 
mimic built-in type interfaces, it should usually stick to simpler named methods. Why 
would an employee database application support expressions like * and +, for example? 
Named methods like giveRaise and promote would usually make more sense. 
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Because of this, we won’t go into details on every operator overloading method available 
in Python in this book. Still, there is one operator overloading method you are likely 
to see in almost every realistic Python class: the __init__ method, which is known as 
the constructor method and is used to initialize objects’ state. You should pay special 
attention to this method, because __init__, along with the self argument, turns out 
to be a key requirement to understanding most OOP code in Python. 


A Third Example 


On to another example. This time, we’ll define a subclass of SecondClass that imple- 
ments three specially named attributes that Python will call automatically: 


e __init__ is run when a new instance object is created (self is the new ThirdClass 
object).” 

e _add_ is run when a ThirdClass instance appears in a + expression. 

e _str_ is run when an object is printed (technically, when it’s converted to its 
print string by the str built-in function or its Python internals equivalent). 


Our new subclass also defines a normally named method named mul, which changes 
the instance object in-place. Here’s the new subclass: 


>>> class ThirdClass(SecondClass): # Inherit from SecondClass 
def init__(self, value): # On "ThirdClass(value)" 
self.data = value 
def _add_ (self, other): # On "self + other" 
return ThirdClass(self.data + other) 
def _str_ (self): # On "print(self)", "str()" 
return '[ThirdClass: %s]' % self.data 
def mul(self, other): # In-place change: named 
self.data *= other 
>>> a = ThirdClass('abc') # __init__ called 
>>> a.display() # Inherited method called 
Current value = "abc" 
>>> print(a) # __str__: returns display string 


[ThirdClass: abc] 


>>> b = a + 'xyz' # __add__: makes a new instance 
>>> b.display() # b has all ThirdClass methods 
Current value = "abcxyz" 

>>> print(b) # __str__: returns display string 


[ThirdClass: abcxyz] 
>>> a.mul(3) # mul: changes instance in-place 


>>> print(a) 
[ThirdClass: abcabcabc] 


* Not to be confused with the __init__.py files in module packages! See Chapter 23 for more details. 
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ThirdClass “is a” SecondClass, so its instances inherit the customized display method 
from SecondClass. This time, though, ThirdClass creation calls pass an argument (e.g., 
“abc”). This argument is passed to the value argument in the _init__ constructor and 
assigned to self.data there. The net effect is that ThirdClass arranges to set the data 
attribute automatically at construction time, instead of requiring setdata calls after the 
fact. 


Further, ThirdClass objects can now show up in + expressions and print calls. For +, 
Python passes the instance object on the left to the self argument in __add__ and the 
value on the right to other, as illustrated in Figure 26-3; whatever __add__ returns be- 
comes the result of the + expression. For print, Python passes the object being printed 
to self in _str__; whatever string this method returns is taken to be the print string 
for the object. With _str__ we can use a normal print to display objects of this class, 
instead of calling the special display method. 


__add__(self, other) 


Figure 26-3. In operator overloading, expression operators and other built-in operations performed 
on class instances are mapped back to specially named methods in the class. These special methods 
are optional and may be inherited as usual. Here, a + expression triggers the __add__ method. 


Specially named methods such as__init__, __add__,and__str__are inherited by sub- 
classes and instances, just like any other names assigned in a class. If they’re not coded 
in a class, Python looks for such names in all its superclasses, as usual. Operator over- 
loading method names are also not built-in or reserved words; they are just attributes 
that Python looks for when objects appear in various contexts. Python usually calls 
them automatically, but they may occasionally be called by your code as well; the 
__init__ method, for example, is often called manually to trigger superclass construc- 
tors (more on this later). 


Notice that the _add__ method makes and returns a new instance object of its class, 
by calling ThirdClass with the result value. By contrast, mul changes the current instance 
object in-place, by reassigning the self attribute. We could overload the * expression 
to do the latter, but this would be too different from the behavior of * for built-in types 
such as numbers and strings, for which it always makes new objects. Common practice 
dictates that overloaded operators should work the same way that built-in operator 
implementations do. Because operator overloading is really just an expression-to- 
method dispatch mechanism, though, you can interpret operators any way you like in 
your own class objects. 
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Why Use Operator Overloading? 


As a class designer, you can choose to use operator overloading or not. Your choice 
simply depends on how much you want your object to look and feel like built-in types. 
As mentioned earlier, if you omit an operator overloading method and do not inherit 
it from a superclass, the corresponding operation will not be supported for your in- 
stances; if it’s attempted, an exception will be thrown (or a standard default will be 
used). 


Frankly, many operator overloading methods tend to be used only when implementing 
objects that are mathematical in nature; a vector or matrix class may overload the 
addition operator, for example, but an employee class likely would not. For simpler 
classes, you might not use overloading at all, and would rely instead on explicit method 
calls to implement your objects’ behavior. 


On the other hand, you might decide to use operator overloading if you need to pass 
a user-defined object to a function that was coded to expect the operators available on 
a built-in type like a list or a dictionary. Implementing the same operator set in your 
class will ensure that your objects support the same expected object interface and so 
are compatible with the function. Although we won’t cover every operator overloading 
method in this book, we’ll see some additional operator overloading techniques in 
action in Chapter 29. 


One overloading method we will explore here is the _init__ constructor method, 
which seems to show up in almost every realistic class. Because it allows classes to fill 
out the attributes in their newly created instances immediately, the constructor is useful 
for almost every kind of class you might code. In fact, even though instance attributes 
are not declared in Python, you can usually find out which attributes an instance will 
have by inspecting its class’s__init__ method. 


The World’s Simplest Python Class 


We've begun studying class statement syntax in detail in this chapter, but I'd again 
like to remind you that the basic inheritance model that classes produce is very simple— 
all it really involves is searching for attributes in trees of linked objects. In fact, we can 
create a class with nothing in it at all. The following statement makes a class with no 
attributes attached (an empty namespace object): 


>>> class rec: pass # Empty namespace object 


We need the no-operation pass statement (discussed in Chapter 13) here because we 
don’t have any methods to code. After we make the class by running this statement 
interactively, we can start attaching attributes to the class by assigning names to it 
completely outside of the original class statement: 


>>> rec.name = 'Bob' # Just objects with attributes 
>>> rec.age = 40 
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And, after we’ve created these attributes by assignment, we can fetch them with the 
usual syntax. When used this way, a class is roughly similar to a “struct” in C, or a 
“record” in Pascal. It’s basically an object with field names attached to it (we can do 
similar work with dictionary keys, but it requires extra characters): 


>>> print (rec.name) # Like a C struct or a record 
Bob 


Notice that this works even though there are no instances of the class yet; classes are 
objects in their own right, even without instances. In fact, they are just self-contained 
namespaces, so as long as we have a reference to a class, we can set or change its 
attributes anytime we wish. Watch what happens when we do create two instances, 
though: 


>>> x = rec() # Instances inherit class names 
>>> y = rec() 


These instances begin their lives as completely empty namespace objects. Because they 
remember the class from which they were made, though, they will obtain the attributes 
we attached to the class by inheritance: 

>>> X.name, y.name # name is stored on the class only 

('Bob', 'Bob') 
Really, these instances have no attributes of their own; they simply fetch the name at- 
tribute from the class object where it is stored. If we do assign an attribute to an instance, 
though, it creates (or changes) the attribute in that object, and no other—attribute 
references kick off inheritance searches, but attribute assignments affect only the ob- 
jects in which the assignments are made. Here, x gets its own name, but y still inherits 
the name attached to the class above it: 

>>> x.name = ‘Sue’ # But assignment changes x only 

>>> rec.name, X.name, y.name 

('Bob', 'Sue', 'Bob') 
In fact, as we'll explore in more detail in Chapter 28, the attributes of a namespace 
object are usually implemented as dictionaries, and class inheritance trees are (generally 
speaking) just dictionaries with links to other dictionaries. If you know where to look, 
you can see this explicitly. 


For example, the _ dict__ attribute is the namespace dictionary for most class-based 
objects (some classes may also define attributes in slots _, an advanced and seldom- 
used feature that we’ll study in Chapters 30 and 31). The following was run in Python 
3.0; the order of names and set of _ X__ internal names present can vary from release 
to release, but the names we assigned are present in all: 


>>> rec. dict__.keys() 
['__module_', 'name', ‘age’, ' dict_', '_weakref_', ' doc _'] 


>>> list(x.__dict__.keys()) 
['name' ] 
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>>> list(y.__dict__.keys()) # list() not required in Python 2.6 
[] 


Here, the class’s namespace dictionary shows the name and age attributes we assigned 
to it, x has its own name, and y is still empty. Each instance has a link to its class for 
inheritance, though—it’s called _class__, if you want to inspect it: 


>>> x.__class__ 
<class '_main__.rec'> 


Classes also have a _bases__ attribute, which is a tuple of their superclasses: 


>>> rec. _bases__ # () empty tuple in Python 2.6 
(<class 'object'>,) 


These two attributes are how class trees are literally represented in memory by Python. 


The main point to take away from this look under the hood is that Python’s class model 
is extremely dynamic. Classes and instances are just namespace objects, with attributes 
created on the fly by assignment. Those assignments usually happen within the class 
statements you code, but they can occur anywhere you have a reference to one of the 
objects in the tree. 


Even methods, normally created by a def nested in a class, can be created completely 
independently of any class object. The following, for example, defines a simple function 
outside of any class that takes one argument: 


>>> def upperName(self): 
return self.name.upper() # Still needs a self 


There is nothing about a class here yet—it’s a simple function, and it can be called as 
such at this point, provided we pass in an object with a name attribute (the name self 
does not make this special in any way): 


>>> upperName(x) # Call as a simple function 

"SUE" 
If we assign this simple function to an attribute of our class, though, it becomes a 
method, callable through any instance (as well as through the class name itself, as long 
as we pass in an instance manually): 


>>> rec.method = upperName 


>>> x.method() # Run method to process x 
"SUE' 

>>> y.method() # Same, but pass y to self 
"BOB' 


t In fact, this is one of the reasons the self argument must always be explicit in Python methods—because 
methods can be created as simple functions independent of a class, they need to make the implied instance 
argument explicit. They can be called as either functions or methods, and Python can neither guess nor 
assume that a simple function might eventually become a class method. The main reason for the explicit 
self argument, though, is to make the meanings of names more obvious: names not referenced through 
self are simple variables, while names referenced through self are obviously instance attributes. 
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>>> rec.method(x) # Can call through instance or class 

' SUE ' 
Normally, classes are filled out by class statements, and instance attributes are created 
by assignments to self attributes in method functions. The point again, though, is that 
they don’t have to be; OOP in Python really is mostly about looking up attributes in 
linked namespace objects. 


Classes Versus Dictionaries 


Although the simple classes of the prior section are meant to illustrate class model 
basics, the techniques they employ can also be used for real work. For example, Chap- 
ter 8 showed how to use dictionaries to record properties of entities in our programs. 
It turns out that classes can serve this role, too—they package information like dic- 
tionaries, but can also bundle processing logic in the form of methods. For reference, 
here is the example for dictionary-based records we used earlier in the book: 


>>> rec = {} 


>>> rec['name'] = 'mel' # Dictionary-based record 
>>> rec['age'] = 45 

>>> rec['job'] = 'trainer/writer' 

>>> 

>>> print(rec['name']) 

mel 


This code emulates tools like records in other languages. As we just saw, though, there 
are also multiple ways to do the same with classes. Perhaps the simplest is this—trading 
keys for attributes: 


>>> class rec: pass 


>>> rec.name = 'mel' # Class-based record 


>>> rec.age = 45 

>>> rec.job = 'trainer/writer' 
>>> 

>>> print(rec.age) 

40 


This code has substantially less syntax than the dictionary equivalent. It uses an empty 
class statement to generate an empty namespace object. Once we make the empty 
class, we fill it out by assigning class attributes over time, as before. 


This works, but a new class statement will be required for each distinct record we will 
need. Perhaps more typically, we can instead generate instances of an empty class to 
represent each distinct entity: 


>>> class rec: pass 


>>> pers1 = rec() # Instance-based records 
>>> persi.name = 'mel' 
>>> pers1.job = 'trainer' 
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>>> persi.age = 40 
>>> 

>>> pers2 = rec() 

>>> pers2.name = 'vls' 


>>> pers2.job = 'developer' 
>>> 

>>> persi.name, pers2.name 
(‘mel', 'vls') 


Here, we make two records from the same class. Instances start out life empty, just like 
classes. We then fill in the records by assigning to attributes. This time, though, there 
are two separate objects, and hence two separate name attributes. In fact, instances of 
the same class don’t even have to have the same set of attribute names; in this example, 
one has a unique age name. Instances really are distinct namespaces, so each has a 
distinct attribute dictionary. Although they are normally filled out consistently by class 
methods, they are more flexible than you might expect. 


Finally, we might instead code a more full-blown class to implement the record and its 
processing: 
>>> class Person: 
def _ init__(self, name, job): # Class = Data + Logic 
self.name = name 
self.job = job 
def info(self): 
return (self.name, self.job) 


>>> rec1 = Person('mel', ‘trainer') 

>>> rec2 = Person('vls', ‘developer’ ) 

>>> 

>>> reci.job, rec2.info() 

(‘trainer', (‘'vls', '‘developer')) 
This scheme also makes multiple instances, but the class is not empty this time: we’ve 
added logic (methods) to initialize instances at construction time and collect attributes 
into a tuple. The constructor imposes some consistency on instances here by always 
setting the name and job attributes. Together, the class’s methods and instance attributes 
create a package, which combines both data and logic. 


We could further extend this code by adding logic to compute salaries, parse names, 
and so on. Ultimately, we might link the class into a larger hierarchy to inherit an 
existing set of methods via the automatic attribute search of classes, or perhaps even 
store instances of the class in a file with Python object pickling to make them persistent. 
In fact, we will—in the next chapter, we’ll expand on this analogy between classes and 
records with a more realistic running example that demonstrates class basics in action. 


In the end, although types like dictionaries are flexible, classes allow us to add behavior 
to objects in ways that built-in types and simple functions do not directly support. 
Although we can store functions in dictionaries, too, using them to process implied 
instances is nowhere near as natural as it is in classes. 
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Chapter Summary 


This chapter introduced the basics of coding classes in Python. We studied the syntax 
of the class statement, and we saw how to use it to build up a class inheritance tree. 
We also studied how Python automatically fills in the first argument in method func- 
tions, how attributes are attached to objects in a class tree by simple assignment, and 
how specially named operator overloading methods intercept and implement built-in 
operations for our instances (e.g., expressions and printing). 


Now that we’ve learned all about the mechanics of coding classes in Python, the next 
chapter turns to a larger and more realistic example that ties together much of what 
we've learned about OOP so far. After that, we’ll continue our look at class coding, 
taking a second pass over the model to fill in some of the details that were omitted here 
to keep things simple. First, though, let’s work through a quiz to review the basics we’ve 
covered so far. 


Test Your Knowledge: Quiz 


. How are classes related to modules? 

. How are instances and classes created? 

. Where and how are class attributes created? 

. Where and how are instance attributes created? 

. What does self mean in a Python class? 

. How is operator overloading coded in a Python class? 

. When might you want to support operator overloading in your classes? 


. Which operator overloading method is most commonly used? 
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. What are the two key concepts required to understand Python OOP code? 


Test Your Knowledge: Answers 


1. Classes are always nested inside a module; they are attributes of a module object. 
Classes and modules are both namespaces, but classes correspond to statements 
(not entire files) and support the OOP notions of multiple instances, inheritance, 
and operator overloading. In a sense, a module is like a single-instance class, with- 
out inheritance, which corresponds to an entire file of code. 


2. Classes are made by running class statements; instances are created by calling a 
class as though it were a function. 
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3. Class attributes are created by assigning attributes to a class object. They are nor- 
mally generated by top-level assignments nested in a class statement—each name 
assigned in the class statement block becomes an attribute of the class object 
(technically, the class statement scope morphs into the class object’s attribute 
namespace). Class attributes can also be created, though, by assigning attributes 
to the class anywhere a reference to the class object exists—i.e., even outside the 
class statement. 


4. Instance attributes are created by assigning attributes to an instance object. They 
are normally created within class method functions inside the class statement by 
assigning attributes to the self argument (which is always the implied instance). 
Again, though, they may be created by assignment anywhere a reference to the 
instance appears, even outside the class statement. Normally, all instance 
attributes are initialized in the _ init constructor method; that way, later 
method calls can assume the attributes already exist. 


5. self is the name commonly given to the first (leftmost) argument in a class method 
function; Python automatically fills it in with the instance object that is the implied 
subject of the method call. This argument need not be called self (though this is 
a very strong convention); its position is what is significant. (Ex-C++ or Java pro- 
grammers might prefer to call it this because in those languages that name reflects 
the same idea; in Python, though, this argument must always be explicit.) 


6. Operator overloading is coded in a Python class with specially named methods; 
they all begin and end with double underscores to make them unique. These are 
not built-in or reserved names; Python just runs them automatically when an in- 
stance appears in the corresponding operation. Python itself defines the mappings 
from operations to special method names. 


7. Operator overloading is useful to implement objects that resemble built-in types 
(e.g., sequences or numeric objects such as matrixes), and to mimic the built-in 
type interface expected by a piece of code. Mimicking built-in type interfaces en- 
ables you to pass in class instances that also have state information—i.e., attributes 
that remember data between operation calls. You shouldn’t use operator over- 
loading when a simple named method will suffice, though. 


8. The __init__ constructor method is the most commonly used; almost every class 
uses this method to set initial values for instance attributes and perform other 
startup tasks. 


9. The special self argument in method functions and the _init__ constructor 
method are the two cornerstones of OOP code in Python. 
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CHAPTER 27 
A More Realistic Example 


We'll dig into more class syntax details in the next chapter. Before we do, though, I’d 
like to show you a more realistic example of classes in action that’s more practical than 
what we’ve seen so far. In this chapter, we’re going to build a set of classes that do 
something more concrete—recording and processing information about people. As 
you'll see, what we call instances and classes in Python programming can often serve 
the same roles as records and programs in more traditional terms. 


Specifically, in this chapter we’re going to code two classes: 


e Person—a class that creates and processes information about people 


e Manager—a customization of Person that modifies inherited behavior 


Along the way, we’ll make instances of both classes and test out their functionality. 
When we’re done, lIl show you a nice example use case for classes—we’ll store our 
instances in a shelve object-oriented database, to make them permanent. That way, you 
can use this code as a template for fleshing out a full-blown personal database written 
entirely in Python. 


Besides actual utility, though, our aim here is also educational: this chapter provides a 
tutorial on object-oriented programming in Python. Often, people grasp the last chap- 
ter’s class syntax on paper, but have trouble seeing how to get started when confronted 
with having to code a new class from scratch. Toward this end, we’ll take it one step 
at a time here, to help you learn the basics; we’ll build up the classes gradually, so you 
can see how their features come together in complete programs. 


In the end, our classes will still be relatively small in terms of code, but they will dem- 
onstrate all of the main ideas in Python’s OOP model. Despite its syntax details, Py- 
thon’s class system really is largely just a matter of searching for an attribute in a tree 
of objects, along with a special first argument for functions. 
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Step 1: Making Instances 


OK, so much for the design phase—let’s move on to implementation. Our first task is 
to start coding the main class, Person. In your favorite text editor, open a new file for 
the code we’ll be writing. It’s a fairly strong convention in Python to begin module 
names with a lowercase letter and class names with an uppercase letter; like the name 
of self arguments in methods, this is not required by the language, but it’s so common 
that deviating might be confusing to people who later read your code. To conform, 
we'll call our new module file person.py and our class within it Person, like this: 


# File person.py (start) 
class Person: 


All our work will be done in this file until later in this chapter. We can code any number 
of functions and classes in a single module file in Python, and this one’s person.py name 
might not make much sense if we add unrelated components to it later. For now, we’ll 
assume everything in it will be Person-related. It probably should be anyhow—as we’ve 
learned, modules tend to work best when they have a single, cohesive purpose. 


Coding Constructors 


Now, the first thing we want to do with our Person class is record basic information 
about people—to fill out record fields, if you will. Of course, these are known as in- 
stance object attributes in Python-speak, and they generally are created by assignment 
to self attributes in class method functions. The normal way to give instance attributes 
their first values is to assign them to self in the _init__ constructor method, which 
contains code run automatically by Python each time an instance is created. Let’s add 
one to our class: 


# Add record field initialization 


class Person: 


def _ init__(self, name, job, pay): # Constructor takes 3 arguments 
self.name = name # Fill out fields when created 
self.job = job # self is the new instance object 


self.pay = pay 


This is a very common coding pattern: we pass in the data to be attached to an instance 
as arguments to the constructor method and assign them to self to retain them per- 
manently. In OO terms, self is the newly created instance object, and name, job, and 
pay become state information—descriptive data saved on an object for later use. Al- 
though other techniques (such as enclosing scope references) can save details, too, 
instance attributes make this very explicit and easy to understand. 


Notice that the argument names appear twice here. This code might seem a bit redun- 
dant at first, but it’s not. The job argument, for example, is a local variable in the scope 
of the _init__ function, but self. job is an attribute of the instance that’s the implied 
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subject of the method call. They are two different variables, which happen to have the 
same name. By assigning the job local to the self. job attribute with self. job=job, we 
save the passed-in job on the instance for later use. As usual in Python, where a name 
is assigned (or what object it is assigned to) determines what it means. 


Speaking of arguments, there’s really nothing magical about __init__, apart from the 
fact that it’s called automatically when an instance is made and has a special first ar- 
gument. Despite its weird name, it’s a normal function and supports all the features of 
functions we’ve already covered. We can, for example, provide defaults for some of its 
arguments, so they need not be provided in cases where their values aren’t available or 
useful. 


To demonstrate, let’s make the job argument optional—it will default to None, meaning 
the person being created is not (currently) employed. If job defaults to None, we’ll 
probably want to default pay to 0, too, for consistency (unless some of the people you 
know manage to get paid without having jobs!). In fact, we have to specify a default 
for pay because according to Python’s syntax rules, any arguments in a function’s header 
after the first default must all have defaults, too: 


# Add defaults for constructor arguments 


class Person: 
def _ init__(self, name, job=None, pay=0): # Normal function args 

self.name = name 

self.job = job 

self.pay = pay 
What this code means is that we’ll need to pass in a name when making Persons, but 
job and pay are now optional; they’ll default to None and 0 if omitted. The self argu- 
ment, as usual, is filled in by Python automatically to refer to the instance object— 
assigning values to attributes of self attaches them to the new instance. 


Testing As You Go 


This class doesn’t do much yet—it essentially just fills out the fields of a new record— 
but it’s a real working class. At this point we could add more code to it for more features, 
but we won’t do that yet. As you’ve probably begun to appreciate already, programming 
in Python is really a matter of incremental prototyping—you write some code, test it, 
write more code, test again, and so on. Because Python provides both an interactive 
session and nearly immediate turnaround after code changes, it’s more natural to test 
as you go than to write a huge amount of code to test all at once. 


Before adding more features, then, let’s test what we’ve got so far by making a few 
instances of our class and displaying their attributes as created by the constructor. We 
could do this interactively, but as you’ve also probably surmised by now, interactive 
testing has its limits—it gets tedious to have to reimport modules and retype test cases 
each time you start a new testing session. More commonly, Python programmers use 


Step 1: Making Instances | 645 


the interactive prompt for simple one-off tests but do more substantial testing by writing 
code at the bottom of the file that contains the objects to be tested, like this: 


# Add incremental self-test code 


class Person: 
def _ init__(self, name, job=None, pay=0): 
self.name = name 
self.job = job 
self.pay = pay 


bob = Person('Bob Smith’ ) # Test the class 

sue = Person('Sue Jones', job='dev', pay=100000) # Runs _init__ automatically 
print(bob.name, bob.pay) # Fetch attached attributes 
print(sue.name, sue.pay) # sue's and bob's attrs differ 


Notice here that the bob object accepts the defaults for job and pay, but sue provides 
values explicitly. Also note how we use keyword arguments when making sue; we could 
pass by position instead, but the keywords may help remind us later what the data is 
(and they allow us to pass the arguments in any left-to-right order we like). Again, 
despite its unusual name, __init__ is a normal function, supporting everything you 
already know about functions—including both defaults and pass-by-name keyword 
arguments. 


When this file runs as a script, the test code at the bottom makes two instances of our 
class and prints two attributes of each (name and pay): 
C:\misc> person. py 


Bob Smith 0 
Sue Jones 100000 


You can also type this file’s test code at Python’s interactive prompt (assuming you 
import the Person class there first), but coding canned tests inside the module file like 
this makes it much easier to rerun them in the future. 


Although this is fairly simple code, it’s already demonstrating something important. 
Notice that bob’s name is not sue’s, and sue’s pay is not bob’s. Each is an independent 
record of information. Technically, bob and sue are both namespace objects—like all 
class instances, they each have their own independent copy of the state information 
created by the class. Because each instance of a class has its own set of self attributes, 
classes are a natural for recording information for multiple objects this way; just like 
built-in types, classes serve as a sort of object factory. Other Python program structures, 
such as functions and modules, have no such concept. 


Using Code Two Ways 


As is, the test code at the bottom of the file works, but there’s a big catch—its top-level 
print statements run both when the file is run as a script and when it is imported as a 
module. This means if we ever decide to import the class in this file in order to use it 
somewhere else (and we will later in this chapter), we'll see the output of its test code 
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every time the file is imported. That’s not very good software citizenship, though: client 
programs probably don’t care about our internal tests and won’t want to see our output 
mixed in with their own. 


Although we could split the test code off into a separate file, it’s often more convenient 
to code tests in the same file as the items to be tested. It would be better to arrange to 
run the test statements at the bottom only when the file is run for testing, not when the 
file is imported. That’s exactly what the module __name__ check is designed for, as you 
learned in the preceding part of this book. Here’s what this addition looks like: 


# Allow this file to be imported as well as run/tested 


class Person: 
def _ init__(self, name, job=None, pay=0): 
self.name = name 
self.job = job 
self.pay = pay 
if _name_ == '_ main_': 
# self-test code 
bob = Person('Bob Smith') 
sue = Person('Sue Jones', job='dev', pay=100000) 
print(bob.name, bob.pay) 
print(sue.name, sue.pay) 


# When run for testing only 


Now, we get exactly the behavior we’ re after—running the file as a top-level script tests 
it because its _name__is__main__, but importing it as a library of classes later does not: 
C:\misc> person. py 


Bob Smith 0 
Sue Jones 100000 


c:\misc> python 

Python 3.0.1 (r301:69561, Feb 13 2009, 20:04:18) ... 
>>> import person 

>>> 


When imported, the file now defines the class, but does not use it. When run directly, 
this file creates two instances of our class as before, and prints two attributes of each; 
again, because each instance is an independent namespace object, the values of their 
attributes differ. 


Version Portability Note 


Pm running all the code in this chapter under Python 3.0, and using the 3.0 print 
function call syntax. If you run under 2.6 the code will work as-is, but you'll notice 
parentheses around some output lines because the extra parentheses in prints turn 
multiple items into a tuple: 

c:\misc> c:\python26\python person. py 


(‘Bob Smith’, 0) 
(‘Sue Jones', 100000) 
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If this difference is the sort of detail that might keep you awake at nights, simply remove 
the parentheses to use 2.6 print statements. You can also avoid the extra parentheses 
portably by using formatting to yield a single object to print. Either of the following 
works in both 2.6 and 3.0, though the method form is newer: 


print('{0} {1}'.format(bob.name, bob.pay)) # New format method 
print('%s %s' % (bob.name, bob.pay)) # Format expression 


Step 2: Adding Behavior Methods 


Everything looks good so far—at this point, our class is essentially a record factory; it 
creates and fills out fields of records (attributes of instances, in more Pythonic terms). 
Even as limited as it is, though, we can still run some operations on its objects. Although 
classes add an extra layer of structure, they ultimately do most of their work by em- 
bedding and processing basic core data types like lists and strings. In other words, if 
you already know how to use Python’s simple core types, you already know much of 
the Python class story; classes are really just a minor structural extension. 


For example, the name field of our objects is a simple string, so we can extract last names 
from our objects by splitting on spaces and indexing. These are all core data type op- 
erations, which work whether their subjects are embedded in class instances or not: 


>>> name = 'Bob Smith' # Simple string, outside class 
>>> name.split() # Extract last name 

['Bob', ‘Smith’ ] 

>>> name.split()[-1] # Or [1], if always just two parts 
"Smith' 


Similarly, we can give an object a pay raise by updating its pay field—that is, by changing 
its state information in-place with an assignment. This task also involves basic opera- 
tions that work on Python’s core objects, regardless of whether they are standalone or 
embedded in a class structure: 


>>> pay = 100000 # Simple variable, outside class 

>>> pay *= 1.10 # Give a 10% raise 

>>> print (pay) # Or: pay = pay * 1.10, if you like to type 
110000.0 # Or: pay = pay + (pay *.10), if you _really_ do! 


To apply these operations to the Person objects created by our script, simply do to 
bob.name and sue.pay what we just did to name and pay. The operations are the same, 
but the subject objects are attached to attributes in our class structure: 


# Process embedded built-in types: strings, mutability 


class Person: 
def _ init__(self, name, job=None, pay=0): 
self.name = name 
self.job = job 
self.pay = pay 


if _name__ == '_main_': 
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bob = Person('Bob Smith') 
sue = Person('Sue Jones', job='dev', pay=100000) 
print(bob.name, bob.pay) 
print(sue.name, sue.pay) 
print (bob.name.split()[-1]) # Extract object's last name 
sue.pay *= 1.10 # Give this object a raise 
print (sue. pay) 
We've added the last two lines here; when they’re run, we extract bob’s last name by 
using basic string and list operations and give sue a pay raise by modifying her pay 
attribute in-place with basic number operations. In a sense, sue is also a mutable 
object—her state changes in-place just like a list after an append call: 
Bob Smith 0 
Sue Jones 100000 


Smith 
110000.0 


The preceding code works as planned, but if you show it to a veteran software developer 
he’ll probably tell you that its general approach is not a great idea in practice. Hard- 
coding operations like these outside of the class can lead to maintenance problems in 
the future. 


For example, what if you’ve hardcoded the last-name-extraction formula at many dif- 
ferent places in your program? If you ever need to change the way it works (to support 
a new name structure, for instance), you’ll need to hunt down and update every oc- 
currence. Similarly, if the pay-raise code ever changes (e.g., to require approval or 
database updates), you may have multiple copies to modify. Just finding all the ap- 
pearances of such code may be problematic in larger programs—they may be scattered 
across many files, split into individual steps, and so on. 


Coding Methods 


What we really want to do here is employ a software design concept known as encap- 
sulation. The idea with encapsulation is to wrap up operation logic behind interfaces, 
such that each operation is coded only once in our program. That way, if our needs 
change in the future, there is just one copy to update. Moreover, we’re free to change 
the single copy’s internals almost arbitrarily, without breaking the code that uses it. 


In Python terms, we want to code operations on objects in class methods, instead of 
littering them throughout our program. In fact, this is one of the things that classes are 
very good at—factoring code to remove redundancy and thus optimize maintainability. 
As an added bonus, turning operations into methods enables them to be applied to any 
instance of the class, not just those that they’ve been hardcoded to process. 


This is all simpler in code than it may sound in theory. The following achieves encap- 
sulation by moving the two operations from code outside the class into class methods. 
While we’re at it, let’s change our self-test code at the bottom to use the new methods 
we're creating, instead of hardcoding operations: 
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# Add methods to encapsulate operations for maintainability 


class Person: 
def _ init__(self, name, job=None, pay=0): 
self.name = name 
self.job = job 
self.pay = pay 


def lastName(self): # Behavior methods 
return self.name.split()[-1] # self is implied subject 
def giveRaise(self, percent): 
self.pay = int(self.pay * (1 + percent)) # Must change here only 
if _name__ == '  main_': 


bob = Person('Bob Smith') 

sue = Person('Sue Jones', job='dev', pay=100000) 

print(bob.name, bob.pay) 

print(sue.name, sue.pay) 

print(bob.lastName(), sue.lastName()) # Use the new methods 
sue.giveRaise(.10) # instead of hardcoding 
print (sue.pay) 


As we’ve learned, methods are simply normal functions that are attached to classes and 


designed to process instances of those classes. The instance is the subject of the method 
call and is passed to the method’s self argument automatically. 


The transformation to the methods in this version is straightforward. The new 
lastName method, for example, simply does to self what the previous version hardco- 
ded for bob, because self is the implied subject when the method is called. lastName 
also returns the result, because this operation is a called function now; it computes a 
value for its caller to use, even if it is just to be printed. Similarly, the new giveRaise 
method just does to self what we did to sue before. 


When run now, our file’s output is similar to before—we’ve mostly just refactored the 
code to allow for easier changes in the future, not altered its behavior: 

Bob Smith 0 

Sue Jones 100000 


Smith Jones 
110000 


A few coding details are worth pointing out here. First, notice that sue’s pay is now still 
an integer after a pay raise—we convert the math result back to an integer by calling 
the int built-in within the method. Changing the value to either int or float is probably 
not a significant concern for most purposes (integer and floating-point objects have the 
same interfaces and can be mixed within expressions), but we may need to address 
rounding issues in a real system (money probably matters to Persons!}). 


As we learned in Chapter 5, we might handle this by using the round(N, 2) built-in to 
round and retain cents, using the decimal type to fix precision, or storing monetary 
values as full floating-point numbers and displaying them with a %.2f or {0:.2} for- 
matting string to show cents. For this example, we’ll simply truncate any cents with 


650 | Chapter 27: A More Realistic Example 


int. (For another idea, also see the money function in the formats.py module of Chap- 
ter 24; you can import this tool to show pay with commas, cents, and dollar signs.) 


Second, notice that we’re also printing sue’s last name this time—because the last-name 
logic has been encapsulated in a method, we get to use it on any instance of the class. 
As we’ve seen, Python tells a method which instance to process by automatically pass- 
ing it in to the first argument, usually called self. Specifically: 


e Inthe first call, bob. lastName(), bob is the implied subject passed to self. 


e Inthe second call, sue.lastName(), sue goes to self instead. 


Trace through these calls to see how the instance winds up in self. The net effect is 
that the method fetches the name of the implied subject each time. The same happens 
for giveRaise. We could, for example, give bob a raise by calling giveRaise for both 
instances this way, too; but unfortunately, bob’s zero pay will prevent him from getting 
a raise as the program is currently coded (something we may want to address in a future 
2.0 release of our software). 


Finally, notice that the giveRaise method assumes that percent is passed in as a floating- 
point number between zero and one. That may be too radical an assumption in the real 
world (a 1000% raise would probably be a bug for most of us!); we’ll let it pass for this 
prototype, but we might want to test or at least document this in a future iteration of 
this code. Stay tuned for a rehash of this idea in a later chapter in this book, where we’ll 
code something called function decorators and explore Python’s assert statement— 
alternatives that can do the validity test for us automatically during development. 


Step 3: Operator Overloading 


At this point, we have a fairly full-featured class that generates and initializes instances, 
along with two new bits of behavior for processing instances (in the form of methods). 
So far, so good. 


As it stands, though, testing is still a bit less convenient than it needs to be—to trace 
our objects, we have to manually fetch and print individual attributes (e.g., bob. name, 
sue.pay). It would be nice if displaying an instance all at once actually gave us some 
useful information. Unfortunately, the default display format for an instance object 
isn’t very good—it displays the object’s class name, and its address in memory (which 
is essentially useless in Python, except as a unique identifier). 


To see this, change the last line in the script to print (sue) so it displays the object as a 
whole. Here’s what you’ll get (the output says that sue is an “object” in 3.0 and an 
“instance” in 2.6): 

Bob Smith 0 

Sue Jones 100000 


Smith Jones 
<__main__.Person object at 0x02614430> 
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Providing Print Displays 


Fortunately, it’s easy to do better by employing operator overloading—coding methods 
in a class that intercept and process built-in operations when run on the class’s 
instances. Specifically, we can make use of what is probably the second most commonly 
used operator overloading method in Python, after _init_: the _str__ method in- 
troduced in the preceding chapter. _str__ is run automatically every time an instance 
is converted to its print string. Because that’s what printing an object does, the net 
transitive effect is that printing an object displays whatever is returned by the object’s 
__str__ method, if it either defines one itself or inherits one from a superclass (double- 
underscored names are inherited just like any other). 


Technically speaking, the _init__ constructor method we’ve already coded is operator 
overloading too—it is run automatically at construction time to initialize a newly cre- 
ated instance. Constructors are socommon, though, that they almost seem like a special 
case. More focused methods like _str__ allow us to tap into specific operations and 
provide specialized behavior when our objects are used in those contexts. 


Let’s put this into code. The following extends our class to give a custom display that 
lists attributes when our class’s instances are displayed as a whole, instead of relying 
on the less useful default display: 


# Add __str__ overload method for printing objects 


class Person: 

def _ init__(self, name, job=None, pay=0): 
self.name = name 
self.job = job 
self.pay = pay 

def lastName(self): 
return self.name.split()[-1] 

def giveRaise(self, percent): 
self.pay = int(self.pay * (1 + percent)) 


def _str_ (self): # Added method 
return '[Person: %s, %s]' % (self.name, self.pay) # String to print 
if _name__ == ''  main_': 


bob = Person('Bob Smith') 

sue = Person('Sue Jones', job='dev', pay=100000) 
print (bob) 

print (sue) 

print(bob.lastName(), sue.lastName()) 

sue. giveRaise(.10) 

print (sue) 


Notice that we’re doing string % formatting to build the display string in _str__ here; 
at the bottom, classes use built-in type objects and operations like these to get their 
work done. Again, everything you’ve already learned about both built-in types and 


functions applies to class-based code. Classes largely just add an additional layer of 
structure that packages functions and data together and supports extensions. 
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We've also changed our self-test code to print objects directly, instead of printing in- 
dividual attributes. When run, the output is more coherent and meaningful now; the 
“[...]” lines are returned by our new _ str_, run automatically by print operations: 
[Person: Bob Smith, 0] 
[Person: Sue Jones, 100000] 


Smith Jones 
[Person: Sue Jones, 110000] 


Here’s a subtle point: as we’ll learn in the next chapter, a related overloading method, 
__repr__, provides an as-code low-level display of an object when present. Sometimes 
classes provide both a _str__ for user-friendly displays anda ___repr__ with extra de- 
tails for developers to view. Because printing runs _str__ and the interactive prompt 
echoes results with _repr__, this can provide both target audiences with an appropriate 
display. Since we’re not interested in displaying an as-code format, __str__ is sufficient 
for our class. 


Step 4: Customizing Behavior by Subclassing 


At this point, our class captures much of the OOP machinery in Python: it makes 
instances, provides behavior in methods, and even does a bit of operator overloading 
now to intercept print operations in _ str_. It effectively packages our data and logic 
together into a single, self-contained software component, making it easy to locate code 
and straightforward to change it in the future. By allowing us to encapsulate behavior, 
it also allows us to factor that code to avoid redundancy and its associated maintenance 
headaches. 


The only major OOP concept it does not yet capture is customization by inheritance. 
In some sense, we’re already doing inheritance, because instances inherit methods from 
their classes. To demonstrate the real power of OOP, though, we need to define a 
superclass/subclass relationship that allows us to extend our software and replace bits 
of inherited behavior. That’s the main idea behind OOP, after all; by fostering a coding 
model based upon customization of work already done, it can dramatically cut devel- 
opment time. 


Coding Subclasses 


As a next step, then, let’s put OOP’s methodology to use and customize our Person 
class by extending our software hierarchy. For the purpose of this tutorial, we’ll define 
a subclass of Person called Manager that replaces the inherited giveRaise method with 
a more specialized version. Our new class begins as follows: 


class Manager (Person): # Define a subclass of Person 


This code means that we’re defining a new class named Manager, which inherits from 
and may add customizations to the superclass Person. In plain terms, a Manager is almost 


Step 4: Customizing Behavior by Subclassing | 653 


like a Person (admittedly, a very long journey for a very small joke...), but Manager has 
a custom way to give raises. 


For the sake of argument, let’s assume that when a Manager gets a raise, it receives the 
passed-in percentage as usual, but also gets an extra bonus that defaults to 10%. For 
instance, if a Manager’s raise is specified as 10%, it will really get 20%. (Any relation to 
Persons living or dead is, of course, strictly coincidental.) Our new method begins as 
follows; because this redefinition of giveRaise will be closer in the class tree to 
Manager instances than the original version in Person, it effectively replaces, and thereby 
customizes, the operation. Recall that according to the inheritance search rules, the 
lowest version of the name wins: 


class Manager (Person): # Inherit Person attrs 
def giveRaise(self, percent, bonus=.10): # Redefine to customize 


Augmenting Methods: The Bad Way 


Now, there are two ways we might code this Manager customization: a good way and a 
bad way. Let’s start with the bad way, since it might be a bit easier to understand. The 


bad way is to cut and paste the code of giveRaise in Person and modify it for Manager, 
like this: 


class Manager (Person): 
def giveRaise(self, percent, bonus=.10): 
self.pay = int(self.pay * (1 + percent + bonus)) # Bad: cut-and-paste 


This works as advertised—when we later call the giveRaise method of a Manager in- 
stance, it will run this custom version, which tacks on the extra bonus. So what’s wrong 
with something that runs correctly? 


The problem here is a very general one: any time you copy code with cut and paste, 
you essentially double your maintenance effort in the future. Think about it: because 
we copied the original version, if we ever have to change the way raises are given (and 
we probably will), we’ll have to change the code in two places, not one. Although this 
is a small and artificial example, it’s also representative of a universal issue—any time 
yow’re tempted to program by copying code this way, you probably want to look for a 
better approach. 


Augmenting Methods: The Good Way 


What we really want to do here is somehow augment the original giveRaise, instead of 
replacing it altogether. The good way to do that in Python is by calling to the original 
version directly, with augmented arguments, like this: 

class Manager (Person): 


def giveRaise(self, percent, bonus=.10): 
Person.giveRaise(self, percent + bonus) # Good: augment original 
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This code leverages the fact that a class method can always be called either through an 
instance (the usual way, where Python sends the instance to the self argument auto- 
matically) or through the class (the less common scheme, where you must pass the 
instance manually). In more symbolic terms, recall that a normal method call of this 
form: 


instance .method(args...) 


is automatically translated by Python into this equivalent form: 


class.method(instance, args...) 


where the class containing the method to be run is determined by the inheritance search 
rule applied to the method’s name. You can code either form in your script, but there 
is a slight asymmetry between the two—you must remember to pass along the instance 
manually if you call through the class directly. The method always needs a subject 
instance one way or another, and Python provides it automatically only for calls made 
through an instance. For calls through the class name, you need to send an instance to 
self yourself; for code inside a method like giveRaise, self already is the subject of the 
call, and hence the instance to pass along. 


Calling through the class directly effectively subverts inheritance and kicks the call 
higher up the class tree to run a specific version. In our case, we can use this technique 
to invoke the default giveRaise in Person, even though it’s been redefined at the 
Manager level. In some sense, we must call through Person this way, because a 
self.giveRaise() inside Manager’s giveRaise code would loop—since self already is a 
Manager, self.giveRaise() would resolve again to Manager .giveRaise, and so on and so 
forth until available memory is exhausted. 


This “good” version may seem like a small difference in code, but it can make a huge 
difference for future code maintenance—because the giveRaise logic lives in just one 
place now (Person’s method), we have only one version to change in the future as needs 
evolve. And really, this form captures our intent more directly anyhow—we want to 
perform the standard giveRaise operation, but simply tack on an extra bonus. Here’s 
our entire module file with this step applied: 


# Add customization of one behavior in a subclass 


class Person: 
def _ init__(self, name, job=None, pay=0): 
self.name = name 
self.job = job 
self.pay = pay 
def lastName(self): 
return self.name.split()[-1] 
def giveRaise(self, percent): 
self.pay = int(self.pay * (1 + percent)) 
def _str_ (self): 
return '[Person: %s, %s]' % (self.name, self.pay) 


class Manager (Person): 
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def giveRaise(self, percent, bonus=.10): # Redefine at this level 
Person.giveRaise(self, percent + bonus) # Call Person's version 


if _name__ == ''  main_': 
bob = Person('Bob Smith') 
sue = Person('Sue Jones', job='dev', pay=100000) 
print (bob) 
print (sue) 
print(bob.lastName(), sue.lastName()) 
sue. giveRaise(.10) 


print (sue) 

tom = Manager('Tom Jones', 'mgr', 50000) # Make a Manager: __init__ 
tom. giveRaise(.10) # Runs custom version 
print (tom. lastName()) # Runs inherited method 
print (tom) # Runs inherited __str__ 


To test our Manager subclass customization, we’ve also added self-test code that makes 
a Manager, calls its methods, and prints it. Here’s the new version’s output: 

[Person: Bob Smith, 0] 

[Person: Sue Jones, 100000] 

Smith Jones 

[Person: Sue Jones, 110000] 


Jones 
[Person: Tom Jones, 60000] 


Everything looks good here: bob and sue are as before, and when tom the Manager is 
given a 10% raise, he really gets 20% (his pay goes from $50K to $60K), because the 
customized giveRaise in Manager is run for him only. Also notice how printing tomas a 
whole at the end of the test code displays the nice format defined in Person’s _ str __ 
Manager objects get this, lastName, and the _init__ constructor method’s code “for 
free” from Person, by inheritance. 


Polymorphism in Action 


To make this acquisition of inherited behavior even more striking, we can add the 
following code at the end of our file: 


if name == '_ main_ 


print('--All three--') 


for object in (bob, sue, tom): # Process objects generically 
object.giveRaise(.10) # Run this object's giveRaise 
print (object) # Run the common __str__ 


Here’s the resulting output: 


[Person: Bob Smith, 0] 
[Person: Sue Jones, 100000] 
Smith Jones 

[Person: Sue Jones, 110000] 
Jones 

[Person: Tom Jones, 60000] 
--All three-- 
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[Person: Bob Smith, 0] 
[Person: Sue Jones, 121000] 
[Person: Tom Jones, 72000] 


In the added code, object is either a Person or a Manager, and Python runs the appro- 
priate giveRaise automatically—our original version in Person for bob and sue, and our 
customized version in Manager for tom. Trace the method calls yourself to see how Py- 
thon selects the right giveRaise method for each object. 


This is just Python’s notion of polymorphism, which we met earlier in the book, at work 
again—what giveRaise does depends on what you do it to. Here, it’s made all the more 
obvious when it selects from code we’ve written ourselves in classes. The practical effect 
in this code is that sue gets another 10% but tom gets another 20%, because 
giveRaise is dispatched based upon the object’s type. As we’ve learned, polymorphism 
is at the heart of Python’s flexibility. Passing any of our three objects to a function that 
calls a giveRaise method, for example, would have the same effect: the appropriate 
version would be run automatically, depending on which type of object was passed. 


On the other hand, printing runs the same __ str _ for all three objects, because it’s 
coded just once in Person. Manager both specializes and applies the code we originally 
wrote in Person. Although this example is small, it’s already leveraging OOP’s talent 
for code customization and reuse; with classes, this almost seems automatic at times. 


Inherit, Customize, and Extend 


In fact, classes can be even more flexible than our example implies. In general, classes 
can inherit, customize, or extend existing code in superclasses. For example, although 
we re focused on customization here, we can also add unique methods to Manager that 
are not present in Person, if Managers require something completely different (Python 
namesake reference intended). The following snippet illustrates. Here, giveRaise re- 
defines a superclass method to customize it, but someThingElse defines something new 
to extend: 
class Person: 
def lastName(self): ... 


def giveRaise(self): ... 
def _str_ (self): ... 


class Manager (Person): # Inherit 
def giveRaise(self, ...): ... # Customize 
def someThingElse(self, ...): ... # Extend 
tom = Manager() 
tom. lastName() # Inherited verbatim 
tom. giveRaise() # Customized version 
tom. someThingElse() # Extension here 
print (tom) # Inherited overload method 


Extra methods like this code’s someThingE1se extend the existing software and are avail- 
able on Manager objects only, not on Persons. For the purposes of this tutorial, however, 
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we'll limit our scope to customizing some of Person’s behavior by redefining it, not 
adding to it. 


OOP: The Big Idea 


As is, our code may be small, but it’s fairly functional. And really, it already illustrates 
the main point behind OOP in general: in OOP, we program by customizing what has 
already been done, rather than copying or changing existing code. This isn’t always an 
obvious win to newcomers at first glance, especially given the extra coding requirements 
of classes. But overall, the programming style implied by classes can cut development 
time radically compared to other approaches. 


For instance, in our example we could theoretically have implemented a custom 
giveRaise operation without subclassing, but none of the other options yield code as 
optimal as ours: 


e Although we could have simply coded Manager from scratch as new, independent 
code, we would have had to reimplement all the behaviors in Person that are the 
same for Managers. 


e Although we could have simply changed the existing Person class in-place for the 
requirements of Manager’s giveRaise, doing so would probably break the places 
where we still need the original Person behavior. 


e Although we could have simply copied the Person class in its entirety, renamed the 
copy to Manager, and changed its giveRaise, doing so would introduce code re- 
dundancy that would double our work in the future—changes made to Person in 
the future would not be picked up automatically, but would have to be manually 
propagated to Manager’s code. As usual, the cut-and-paste approach may seem 
quick now, but it doubles your work in the future. 


The customizable hierarchies we can build with classes provide a much better solution 
for software that will evolve over time. No other tools in Python support this develop- 
ment mode. Because we can tailor and extend our prior work by coding new subclasses, 
we can leverage what we’ve already done, rather than starting from scratch each time, 
breaking what already works, or introducing multiple copies of code that may all have 
to be updated in the future. When done right, OOP is a powerful programmer’s ally. 


Step 5: Customizing Constructors, Too 


Our code works as it is, but if you study the current version closely, you may be struck 
by something a bit odd—it seems pointless to have to provide a mgr job name for 
Manager objects when we create them: this is already implied by the class itself. It would 
be better if we could somehow fill in this value automatically when a Manager is made. 


The trick we need to improve on this turns out to be the same as the one we employed 
in the prior section: we want to customize the constructor logic for Managers in such a 
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way as to provide a job name automatically. In terms of code, we want to redefine an 
__init__ method in Manager that provides the mgr string for us. And like with the 
giveRaise customization, we also want to run the original _init__ in Person by calling 
through the class name, so it still initializes our objects’ state information attributes. 


The following extension will do the job—we’ve coded the new Manager constructor and 
changed the call that creates tom to not pass in the mgr job name: 


# Add customization of constructor in a subclass 


class Person: 
def _ init__(self, name, job=None, pay=0): 
self.name = name 
self.job = job 
self.pay = pay 
def lastName(self): 
return self.name.split()[-1] 
def giveRaise(self, percent): 
self.pay = int(self.pay * (1 + percent)) 
def _ str (self): 
return '[Person: %s, %s]' % (self.name, self.pay) 


class Manager (Person): 
def _init__(self, name, pay): # Redefine constructor 
Person.__init__(self, name, 'mgr', pay) # Run original with 'mgr' 
def giveRaise(self, percent, bonus=.10): 
Person.giveRaise(self, percent + bonus) 


if _name__ == '_main_': 
bob = Person('Bob Smith') 
sue = Person('Sue Jones', job='dev', pay=100000) 
print(bob) 
print(sue) 
print(bob.lastName(), sue.lastName()) 
sue.giveRaise(.10) 


print(sue) 

tom = Manager('Tom Jones', 50000) # Job name not needed: 
tom.giveRaise(.10) # Implied/set by class 
print(tom.lastName()) 

print(tom) 


Again, we’re using the same technique to augment the _init__ constructor here that 
we used for giveRaise earlier—running the superclass version by calling through the 
class name directly and passing the self instance along explicitly. Although the con- 
structor has a strange name, the effect is identical. Because we need Person’s construc- 
tion logic to run too (to initialize instance attributes), we really have to call it this way; 
otherwise, instances would not have any attributes attached. 


Calling superclass constructors from redefinitions this way turns out to be a very 
common coding pattern in Python. By itself, Python uses inheritance to look for and 
call only one _init__ method at construction time—the lowest one in the class tree. If 
you need higher __init__ methods to be run at construction time (and you usually do), 
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you must call them manually through the superclass’s name. The upside to this is that 
you can be explicit about which argument to pass up to the superclass’s constructor 
and can choose to not call it at all: not calling the superclass constructor allows you to 
replace its logic altogether, rather than augmenting it. 


The output of this file’s self-test code is the same as before—we haven’t changed what 
it does, we’ve simply restructured to get rid of some logical redundancy: 

[Person: Bob Smith, 0] 

[Person: Sue Jones, 100000] 

Smith Jones 

[Person: Sue Jones, 110000] 

Jones 

[Person: Tom Jones, 60000] 


OOP Is Simpler Than You May Think 


In this complete form, despite their sizes, our classes capture nearly all the important 
concepts in Python’s OOP machinery: 


e Instance creation—filling out instance attributes 

e Behavior methods—encapsulating logic in class methods 

e Operator overloading—providing behavior for built-in operations like printing 
e Customizing behavior—redefining methods in subclasses to specialize them 


e Customizing constructors—adding initialization logic to superclass steps 


Most of these concepts are based upon just three simple ideas: the inheritance search 
for attributes in object trees, the special self argument in methods, and operator over- 
loading’s automatic dispatch to methods. 


Along the way, we’ve also made our code easy to change in the future, by harnessing 
the class’s propensity for factoring code to reduce redundancy. For example, we wrap- 
ped up logic in methods and called back to superclass methods from extensions to 
avoid having multiple copies of the same code. Most of these steps were a natural 
outgrowth of the structuring power of classes. 


By and large, that’s all there is to OOP in Python. Classes certainly can become larger 
than this, and there are some more advanced class concepts, such as decorators and 
metaclasses, which we will meet in later chapters. In terms of the basics, though, our 
classes already do it all. In fact, if you’ve grasped the workings of the classes we’ve 
written, most OOP Python code should now be within your reach. 


Other Ways to Combine Classes 


Having said that, I should also tell you that although the basic mechanics of OOP are 
simple in Python, some of the art in larger programs lies in the way that classes are put 
together. We’re focusing on inheritance in this tutorial because that’s the mechanism 
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the Python language provides, but programmers sometimes combine classes in other 
ways, too. For example, a common coding pattern involves nesting objects inside each 
other to build up composites. We’ll explore this pattern in more detail in Chapter 30, 
which is really more about design than about Python; as a quick example, though, we 
could use this composition idea to code our Manager extension by embedding a 
Person, instead of inheriting from it. 


The following alternative does so by using the _ getattr__ operator overloading 
method we will meet in Chapter 29 to intercept undefined attribute fetches and delegate 
them to the embedded object with the getattr built-in. The giveRaise method here 
still achieves customization, by changing the argument passed along to the embedded 
object. In effect, Manager becomes a controller layer that passes calls down to the em- 
bedded object, rather than up to superclass methods: 


# Embedding-based Manager alternative 


class Person: 
.. e SAME... 


class Manager: 
def _init_ (self, name, pay): 


self.person = Person(name, 'mgr', pay) # Embed a Person object 
def giveRaise(self, percent, bonus=.10): 
self.person.giveRaise(percent + bonus) # Intercept and delegate 
def _ getattr_(self, attr): 
return getattr(self.person, attr) # Delegate all other attrs 
def _ str (self): 
return str(self.person) # Must overload again (in 3.0) 
if _name_ == '  main_' 
same 


In fact, this Manager alternative is representative of a general coding pattern usually 
known as delegation—a composite-based structure that manages a wrapped object and 
propagates method calls to it. This pattern works in our example, but it requires about 
twice as much code and is less well suited than inheritance to the kinds of direct cus- 
tomizations we meant to express (in fact, no reasonable Python programmer would 
code this example this way in practice, except perhaps those writing general tutorials). 
Manager isn’t really a Person here, so we need extra code to manually dispatch method 
calls to the embedded object; operator overloading methods like _str__ must be re- 
defined (in 3.0, at least, as noted in the upcoming sidebar “Catching Built-in Attributes 
in 3.0” on page 662), and adding new Manager behavior is less straightforward since 
state information is one level removed. 


Still, object embedding, and design patterns based upon it, can be a very good fit when 
embedded objects require more limited interaction with the container than direct cus- 
tomization implies. A controller layer like this alternative Manager, for example, might 
come in handy if we want to trace or validate calls to another object’s methods (indeed, 
we will use a nearly identical coding pattern when we study class decorators later in the 
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book). Moreover, a hypothetical Department class like the following could aggregate 
other objects in order to treat them as a set. Add this to the bottom of the person.py file 
to try this on your own: 


# Aggregate embedded objects into a composite 


bob = Person(...) 
sue = Person(...) 
tom = Manager(...) 


class Department: 

def _ init__(self, *args): 
self.members = list(args) 

def addMember(self, person): 
self.members.append(person) 

def giveRaises(self, percent): 
for person in self.members: 

person. giveRaise(percent) 

def showAll(self): 

for person in self.members: 


print (person) 
development = Department(bob, sue) # Embed objects in a composite 
development .addMember (tom) 
development .giveRaises(.10) # Runs embedded objects' giveRaise 
development . showAl1() # Runs embedded objects' __str__s 


Interestingly, this code uses both inheritance and composition—Department is a com- 
posite that embeds and controls other objects to aggregate, but the embedded Person 
and Manager objects themselves use inheritance to customize. As another example, a 
GUI might similarly use inheritance to customize the behavior or appearance of labels 
and buttons, but also composition to build up larger packages of embedded widgets, 
such as input forms, calculators, and text editors. The class structure to use depends 
on the objects you are trying to model. 


Design issues like composition are explored in Chapter 30, so we’ll postpone further 
investigations for now. But again, in terms of the basic mechanics of OOP in Python, 
our Person and Manager classes already tell the entire story. Having mastered the basics 
of OOP, though, developing general tools for applying it more easily in your scripts is 
often a natural next step—and the topic of the next section. 


Catching Built-in Attributes in 3.0 


In Python 3.0 (and 2.6 if new-style classes are used), the alternative delegation-based 
Manager class we just coded will not be able to intercept and delegate operator over- 
loading method attributes like __str__ without redefining them. Although we know 
that _str__ is the only such name used in our specific example, this a general issue for 
delegation-based classes. 


Recall that built-in operations like printing and indexing implicitly invoke operator 
overloading methods such as _str__ and _ getitem_. In 3.0, built-in operations like 
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these do not route their implicit attribute fetches through generic attribute managers: 
neither _getattr__ (run for undefined attributes) nor its cousin _getattribute__ (run 
for all attributes) is invoked. This is why we have to redefine _str__ redundantly in 
the alternative Manager, in order to ensure that printing is routed to the embedded 
Person object when run in Python 3.0. 


Technically, this happens because classic classes normally look up operator overloading 
names in instances at runtime, but new-style classes do not—they skip the instance 
entirely and look up such methods in classes. In 2.6 classic classes, built-ins do route 
attributes generically—printing, for example, routes _str_ through _ getattr_. 
New-style classes also inherit a default for _str__ that would foil _ getattr__, but 
__getattribute doesn’t intercept the name in 3.0 either. 


This is a change, but isn’t a show-stopper—delegation-based classes can generally re- 
define operator overloading methods to delegate them to wrapped objects in 3.0, either 
manually or via tools or superclasses. This topic is too advanced to explore further in 
this tutorial, though, so don’t sweat the details too much here. Watch for it to be 
revisited in the attribute management coverage of Chapter 37, and again in the context 
of Private class decorators in Chapter 38. 


Step 6: Using Introspection Tools 


Let’s make one final tweak before we throw our objects onto a database. As they are, 
our classes are complete and demonstrate most of the basics of OOP in Python. They 
still have two remaining issues we probably should iron out, though, before we go live 
with them: 


e First, if you look at the display of the objects as they are right now, you’ll notice 
that when you print tom the Manager labels him as a Person. That’s not technically 
incorrect, since Manager is a kind of customized and specialized Person. Still, it 
would be more accurate to display objects with the most specific (that is, lowest) 
classes possible. 


Second, and perhaps more importantly, the current display format shows only the 
attributes we include in our __str_, and that might not account for future goals. 
For example, we can’t yet verify that tom’s job name has been set to mgr correctly 
by Manager’s constructor, because the _str__ we coded for Person does not print 
this field. Worse, if we ever expand or otherwise change the set of attributes as- 
signed to our objects in _init__, we’ll have to remember to also update _ str __ 
for new names to be displayed, or it will become out of sync over time. 


The last point means that, yet again, we’ve made potential extra work for ourselves in 
the future by introducing redundancy in our code. Because any disparity in__str_ will 
be reflected in the program’s output, this redundancy may be more obvious than the 
other forms we addressed earlier; still, avoiding extra work in the future is generally a 
good thing. 
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Special Class Attributes 


We can address both issues with Python’s introspection tools—special attributes and 
functions that give us access to some of the internals of objects’ implementations. These 
tools are somewhat advanced and generally used more by people writing tools for other 
programmers to use than by programmers developing applications. Even so, a basic 
knowledge of some of these tools is useful because they allow us to write code that 
processes classes in generic ways. In our code, for example, there are two hooks that 
can help us out, both of which were introduced near the end of the preceding chapter: 


e The built-in instance._class__ attribute provides a link from an instance to the 
class from which it was created. Classes in turn havea __name__, just like modules, 
anda _bases_ sequence that provides access to superclasses. We can use these 
here to print the name of the class from which an instance is made rather than one 
we've hardcoded. 


e The built-in object. __dict__ attribute provides a dictionary with one key/value 
pair for every attribute attached to a namespace object (including modules, classes, 
and instances). Because it is a dictionary, we can fetch its keys list, index by key, 
iterate over its keys, and so on, to process all attributes generically. We can use this 
here to print every attribute in any instance, not just those we hardcode in custom 
displays. 


Here’s what these tools look like in action at Python’s interactive prompt. Notice how 
we load Person at the interactive prompt with a from statement here—class names live 
in and are imported from modules, exactly like function names and other variables: 
>>> from person import Person 
>>> bob = Person('Bob Smith') 


>>> print (bob) # Show bob's __str__ 
[Person: Bob Smith, 0] 


>>> bob.__class__ # Show bob's class and its name 
<class 'person.Person'> 

>>> bob. class .__name__ 

"Person' 

>>> list(bob.__dict__.keys()) # Attributes are really dict keys 
['pay', 'job', ‘name’ ] # Use list to force list in 3.0 


>>> for key in bob. dict_: 
print(key, '=>', bob. dict_ [key]) # Index manually 


pay => 0 
job => None 
name => Bob Smith 


>>> for key in bob. dict_: 
print(key, '=>', getattr(bob, key)) # obj.attr, but attr is a var 


pay => 0 
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job => None 
name => Bob Smith 


As noted briefly in the prior chapter, some attributes accessible from an instance might 
not be stored in the _ dict__ dictionary if the instance’s class defines _slots__, an 
optional and relatively obscure feature of new-style classes (and all classes in Python 
3.0) that stores attributes in an array and that we'll discuss in Chapters 30 and 31. Since 
slots really belong to classes instead of instances, and since they are very rarely used in 
any event, we can safely ignore them here and focus on the normal __dict_. 


A Generic Display Tool 


We can put these interfaces to work in a superclass that displays accurate class names 
and formats all attributes of an instance of any class. Open a new file in your text editor 
to code the following—it’s a new, independent module named classtools.py that im- 
plements just sucha class. Because its _str__ print overload uses generic introspection 
tools, it will work on any instance, regardless of its attributes set. And because this is a 
class, it automatically becomes a general formatting tool: thanks to inheritance, it can 
be mixed into any class that wishes to use its display format. As an added bonus, if we 
ever want to change how instances are displayed we need only change this class, as 
every class that inherits its str__ will automatically pick up the new format when it’s 
next run: 


# File classtools.py (new) 
"Assorted class utilities and tools" 


class AttrDisplay: 
Provides an inheritable print overload method that displays 
instances with their class names and a name=value pair for 
each attribute stored on the instance itself (but not attrs 
inherited from its classes). Can be mixed into any class, 
and will work on any instance. 
def gatherAttrs(self): 
attrs = [] 
for key in sorted(self.dict_): 
attrs.append('%s=%s' % (key, getattr(self, key))) 
return ', '.join(attrs) 
def _str_ (self): 
return '[%s: %s]' % (self. __class_.__name_, self.gatherAttrs()) 


if _name__ == '_main_': 
class TopTest(AttrDisplay): 
count = 0 


def init__(self): 
self.attr1 = TopTest.count 
self.attr2 = TopTest.count+1 
TopTest.count += 2 
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class SubTest(TopTest): 
pass 


X, Y = TopTest(), SubTest() 
print (X) # Show all instance attrs 
print(Y) # Show lowest class name 


Notice the docstrings here—as a general-purpose tool, we want to add some functional 
documentation for potential users to read. As we saw in Chapter 15, docstrings can be 
placed at the top of simple functions and modules, and also at the start of classes and 
their methods; the help function and the PyDoc tool extracts and displays these auto- 
matically (we'll look at docstrings again in Chapter 28). 


When run directly, this module’s self-test makes two instances and prints them; the 
__str__ defined here shows the instance’s class, and all its attributes names and values, 
in sorted attribute name order: 

C:\misc> classtools.py 


[TopTest: attr1=0, attr2=1] 
[SubTest: attr1=2, attr2=3] 


Instance Versus Class Attributes 


If you study the classtools module’s self-test code long enough, you’ll notice that its 
class displays only instance attributes, attached to the self object at the bottom of the 
inheritance tree; that’s what self’s__dict__ contains. As an intended consequence, we 
don’t see attributes inherited by the instance from classes above it in the tree (e.g., 
count in this file’s self-test code). Inherited class attributes are attached to the class only, 
not copied down to instances. 


If you ever do wish to include inherited attributes too, you can climb the class _ link 
to the instance’s class, use the __dict__ there to fetch class attributes, and then iterate 
through the class’s___bases___ attribute to climb to even higher superclasses (repeating 
as necessary). If you’re a fan of simple code, running a built-in dir call on the instance 
instead of using __dict__ and climbing would have much the same effect, since dir 
results include inherited names in the sorted results list: 


>>> from person import Person 
>>> bob = Person('Bob Smith’) 


# In Python 2.6: 


>>> bob. dict__.keys() # Instance attrs only 
['pay', Tjob', “name ] 
>>> dir(bob) # + inherited attrs in classes 


['_doc_', '_init_', '_module_', '_str_', 'giveRaise', 'job', 
‘lastName', 'name', 'pay'] 


# In Python 3.0: 
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>>> list(bob._dict__.keys()) # 3.0 keys is a view, not a list 
[‘pay', ‘job’, 'name™] 


>>> dir(bob) # 3.0 includes class type methods 
['_class_', '_delattr_', '_dict_', '_doc_', '_eq_', '_ format_', 
"ge _', '_getattribute_', ' gt _', '_hash_', '_init_', '_le _', 
...more lines omitted... 

"_setattr_', '_sizeof_', '_str_', '_subclasshook_', ' weakref_', 


"giveRaise', 'job', ‘lastName', 'name', 'pay'] 


The output here varies between Python 2.6 and 3.0, because 3.0’s dict. keys is not a 
list, and 3.0’s dir returns extra class-type implementation attributes. Technically, dir 
returns more in 3.0 because classes are all “new style” and inherit a large set of operator 
overloading names from the class type. In fact, you’ll probably want to filter out most 
of the _X__ names in the 3.0 dir result, since they are internal implementation details 
and not something you’d normally want to display. 


In the interest of space, we’ll leave optional display of inherited class attributes with 
either tree climbs or dir as suggested experiments for now. For more hints on this front, 
though, watch for the classtree.py inheritance tree climber we will write in Chap- 
ter 28, and the lister.py attribute listers and climbers we’ll code in Chapter 30. 


Name Considerations in Tool Classes 


One last subtlety here: because our AttrDisplay class in the classtools module is a 
general tool designed to be mixed into other arbitrary classes, we have to be aware of 
the potential for unintended name collisions with client classes. As is, I’ve assumed that 
client subclasses may want to use both its _str__ and gatherAttrs, but the latter of 
these may be more than a subclass expects—if a subclass innocently defines a gather 
Attrs name of its own, it will likely break our class, because the lower version in the 
subclass will be used instead of ours. 


To see this for yourself, add a gatherAttrs to TopTest in the file’s self-test code; unless 
the new method is identical, or intentionally customizes the original, our tool class will 
no longer work as planned: 


class TopTest(AttrDisplay): 


def gatherAttrs(self): # Replaces method in AttrDisplay! 
return 'Spam' 


This isn’t necessarily bad—sometimes we want other methods to be available to sub- 
classes, either for direct calls or for customization. If we really meant to provide a 
__str__ only, though, this is less than ideal. 


To minimize the chances of name collisions like this, Python programmers often prefix 
methods not meant for external use with a single underscore: _gatherAttrs in our case. 
This isn’t foolproof (what if another class defines _gatherAttrs, too?), but it’s usually 
sufficient, and it’s a common Python naming convention for methods internal to a class. 
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A better and less commonly used solution would be to use two underscores at the front 
of the method name only: __gatherAttrs for us. Python automatically expands such 
names to include the enclosing class’s name, which makes them truly unique. This is 
a feature usually called pseudoprivate class attributes, which we’ll expand on in Chap- 
ter 30. For now, we’ll make both our methods available. 


Our Classes’ Final Form 


Now, to use this generic tool in our classes, all we need to do is import it from its 
module, mix it in by inheritance in our top-level class, and get rid of the more specific 
__str__wecoded before. The new print overload method will be inherited by instances 
of Person, as well as Manager; Manager gets __str__ from Person, which now obtains it 
from the AttrDisplay coded in another module. Here is the final version of our 
person.py file with these changes applied: 


# File person.py (final) 
from classtools import AttrDisplay # Use generic display tool 


class Person(AttrDisplay): 


nun 


Create and process person records 

def _init_(self, name, job=None, pay=0): 
self.name = name 
self.job = job 
self.pay = pay 


def lastName(self): # Assumes last is last 
return self.name.split()[-1] 
def giveRaise(self, percent): # Percent must be 0..1 


self.pay = int(self.pay * (1 + percent)) 


class Manager (Person): 


non 


A customized Person with special requirements 
def _init_ (self, name, pay): 
Person. init__(self, name, 'mgr', pay) 
def giveRaise(self, percent, bonus=.10): 
Person.giveRaise(self, percent + bonus) 


if _name__ == ''  main_': 
bob = Person('Bob Smith') 
sue = Person('Sue Jones', job='dev', pay=100000) 
print (bob) 
print (sue) 
print(bob.lastName(), sue.lastName()) 
sue.giveRaise(.10) 
print(sue) 
tom = Manager('Tom Jones', 50000) 
tom.giveRaise(.10) 
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print(tom.lastName()) 
print (tom) 


As this is the final revision, we’ve added a few comments here to document our work— 
doestrings for functional descriptions and # for smaller notes, per best-practice con- 
ventions. When we run this code now, we see all the attributes of our objects, not just 
the ones we hardcoded in the original _str__. And our final issue is resolved: because 
AttrDisplay takes class names off the self instance directly, each object is shown with 
the name of its closest (lowest) class—tom displays as a Manager now, not a Person, and 
we can finally verify that his job name has been correctly filled in by the Manager 
constructor: 

C:\misc> person. py 

[Person: job=None, name=Bob Smith, pay=0] 

[Person: job=dev, name=Sue Jones, pay=100000] 

Smith Jones 

[Person: job=dev, name=Sue Jones, pay=110000] 

Jones 

[Manager: job=mgr, name=Tom Jones, pay=60000] 


This is the more useful display we were after. From a larger perspective, though, our 
attribute display class has become a general tool, which we can mix into any class by 
inheritance to leverage the display format it defines. Further, all its clients will auto- 
matically pick up future changes in our tool. Later in the book, we’ll meet even more 
powerful class tool concepts, such as decorators and metaclasses; along with Python’s 
introspection tools, they allow us to write code that augments and manages classes in 
structured and maintainable ways. 


Step 7 (Final): Storing Objects in a Database 


At this point, our work is almost complete. We now have a two-module system that not 
only implements our original design goals for representing people, but also provides a 
general attribute display tool we can use in other programs in the future. By coding 
functions and classes in module files, we’ve ensured that they naturally support reuse. 
And by coding our software as classes, we’ve ensured that it naturally supports 
extension. 


Although our classes work as planned, though, the objects they create are not real 
database records. That is, if we kill Python, our instances will disappear—they’re tran- 
sient objects in memory and are not stored in a more permanent medium like a file, so 
they won’t be available in future program runs. It turns out that it’s easy to make 
instance objects more permanent, with a Python feature called object persistence— 
making objects live on after the program that creates them exits. As a final step in this 
tutorial, let’s make our objects permanent. 
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Pickles and Shelves 


Object persistence is implemented by three standard library modules, available in every 
Python: 


pickle 
Serializes arbitrary Python objects to and from a string of bytes 


dbm (named anydbm in Python 2.6) 
Implements an access-by-key filesystem for storing strings 


shelve 
Uses the other two modules to store Python objects on a file by key 


We met these modules very briefly in Chapter 9 when we studied file basics. They 
provide powerful data storage options. Although we can’t do them complete justice in 
this tutorial or book, they are simple enough that a brief introduction is enough to get 
you started. 


The pickle module is a sort of super-general object formatting and deformatting tool: 
given a nearly arbitrary Python object in memory, it’s clever enough to convert the 
object to a string of bytes, which it can use later to reconstruct the original object in 
memory. The pickle module can handle almost any object you can create—lists, dic- 
tionaries, nested combinations thereof, and class instances. The latter are especially 
useful things to pickle, because they provide both data (attributes) and behavior (meth- 
ods); in fact, the combination is roughly equivalent to “records” and “programs.” Be- 
cause pickle is so general, it can replace extra code you might otherwise write to create 
and parse custom text file representations for your objects. By storing an object’s pickle 
string on a file, you effectively make it permanent and persistent: simply load and un- 
pickle it later to re-create the original object. 


Although it’s easy to use pickle by itself to store objects in simple flat files and load 
them from there later, the shelve module provides an extra layer of structure that allows 
you to store pickled objects by key. shelve translates an object to its pickled string with 
pickle and stores that string under a key in a dbm file; when later loading, shelve fetches 
the pickled string by key and re-creates the original object in memory with pickle. This 
is all quite a trick, but to your script a shelve’ of pickled objects looks just like a dic- 
tionary—you index by key to fetch, assign to keys to store, and use dictionary tools 
such as len, in, and dict. keys to get information. Shelves automatically map dictionary 
operations to objects stored in a file. 


In fact, to your script the only coding difference between a shelve and a normal dic- 
tionary is that you must open shelves initially and must close them after making changes. 
The net effect is that a shelve provides a simple database for storing and fetching native 
Python objects by keys, and thus makes them persistent across program runs. It does 


* Yes, we use “shelve” as a noun in Python, much to the chagrin of a variety of editors I’ve worked with over 
the years, both electronic and human. 
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not support query tools such as SQL, and it lacks some advanced features found in 
enterprise-level databases (such as true transaction processing), but native Python ob- 
jects stored on a shelve may be processed with the full power of the Python language 
once they are fetched back by key. 


Storing Objects on a Shelve Database 


Pickling and shelves are somewhat advanced topics, and we won’t go into all their 
details here; you can read more about them in the standard library manuals, as well as 
application-focused books such as Programming Python. This is all simpler in Python 
than in English, though, so let’s jump into some code. 


Let’s write a new script that throws objects of our classes onto a shelve. In your text 
editor, open a new file we’ll call makedb.py. Since this is a new file, we’ll need to import 
our classes in order to create a few instances to store. We used from to load a class at 
the interactive prompt earlier, but really, as with functions and other variables, there 
are two ways to load a class from a file (class names are variables like any other, and 
not at all magic in this context): 


import person # Load class with import 
bob = person.Person(...) # Go through module name 
from person import Person # Load class with from 

bob = Person(...) # Use name directly 


We'll use from to load in our script, just because it’s a bit less to type. Copy or retype 
this code to make instances of our classes in the new script, so we have something to 
store (this is a simple demo, so we won’t worry about the test-code redundancy here). 
Once we have some instances, it’s almost trivial to store them on a shelve. We simply 
import the shelve module, open a new shelve with an external filename, assign the 
objects to keys in the shelve, and close the shelve when we’re done because we’ve made 
changes: 


# File makedb.py: store Person objects on a shelve database 


from person import Person, Manager # Load our classes 
bob = Person('Bob Smith’) # Re-create objects to be stored 
sue = Person('Sue Jones', job='dev', pay=100000) 


tom = Manager('Tom Jones', 50000) 


import shelve 


db = shelve.open('persondb' ) # Filename where objects are stored 

for object in (bob, sue, tom): # Use object's name attr as key 
db[object.name] = object # Store object on shelve by key 

db.close() # Close after making changes 


Notice how we assign objects to the shelve using their own names as keys. This is just 
for convenience; in a shelve, the key can be any string, including one we might create 
to be unique using tools such as process IDs and timestamps (available in the os and 
time standard library modules). The only rule is that the keys must be strings and should 
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be unique, since we can store just one object per key (though that object can be a list 
or dictionary containing many objects). The values we store under keys, though, can 
be Python objects of almost any sort: built-in types like strings, lists, and dictionaries, 
as well as user-defined class instances, and nested combinations of all of these. 


That’s all there is to it—if this script has no output when run, it means it probably 
worked; we’re not printing anything, just creating and storing objects: 


C:\misc> makedb.py 


Exploring Shelves Interactively 


At this point, there are one or more real files in the current directory whose names all 
start with “persondb”. The actual files created can vary per platform, and just like in 
the built-in open function, the filename in shelve. open() is relative to the current work- 
ing directory unless it includes a directory path. Wherever they are stored, these files 
implement a keyed-access file that contains the pickled representation of our three 
Python objects. Don’t delete these files—they are your database, and are what you'll 
need to copy or transfer when you back up or move your storage. 


You can look at the shelve’s files if you want to, either from Windows Explorer or the 
Python shell, but they are binary hash files, and most of their content makes little sense 
outside the context of the shelve module. With Python 3.0 and no extra software in- 
stalled, our database is stored in three files (in 2.6, it’s just one file, persondb, because 
the bsddb extension module is preinstalled with Python for shelves; in 3.0, bsddb is a 
third-party open source add-on): 


# Directory listing module: verify files are present 


>>> import glob 
>>> glob.glob('person*' ) 
['person.py', 'person.pyc', 'persondb.bak', '‘persondb.dat', ‘persondb.dir' ] 


# Type the file: text mode for string, binary mode for bytes 


>>> print(open('persondb.dir').read()) 
"Tom Jones', (1024, 91) 
...more omitted... 


>>> print(open('persondb.dat', 'rb').read()) 
b'\x80\x03cperson\nPerson\nq\x00) \x81q\x01}q\x02(X\x03\x00\x00\x00payq\x03K... 
...more omitted... 


This content isn’t impossible to decipher, but it can vary on different platforms and 
doesn’t exactly qualify as a user-friendly database interface! To verify our work better, 
we can write another script, or poke around our shelve at the interactive prompt. Be- 
cause shelves are Python objects containing Python objects, we can process them with 
normal Python syntax and development modes. Here, the interactive prompt effectively 
becomes a database client: 
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>>> import shelve 


>>> db = shelve.open('persondb' ) # Reopen the shelve 

>>> len(db) # Three 'records' stored 

3 

>>> list(db.keys()) # keys is the index 

['Tom Jones', ‘Sue Jones', ‘Bob Smith’ ] # list to make a list in 3.0 

>>> bob = db['Bob Smith'] # Fetch bob by key 

>>> print (bob) # Runs __str__ from AttrDisplay 
[Person: job=None, name=Bob Smith, pay=0] 

>>> bob. lastName() # Runs lastName from Person 
"Smith' 

>>> for key in db: # Iterate, fetch, print 


print(key, '=>', db[key]) 


Tom Jones => [Manager: job=mgr, name=Tom Jones, pay=50000] 
Sue Jones => [Person: job=dev, name=Sue Jones, pay=100000] 
Bob Smith => [Person: job=None, name=Bob Smith, pay=0] 


>>> for key in sorted(db): 
print(key, '=>', db[key]) # Iterate by sorted keys 


Bob Smith => [Person: job=None, name=Bob Smith, pay=0] 
Sue Jones => [Person: job=dev, name=Sue Jones, pay=100000] 
Tom Jones => [Manager: job=mgr, name=Tom Jones, pay=50000] 


Notice that we don’t have to import our Person or Manager classes here in order to load 
or use our stored objects. For example, we can call bob’s lastName method freely, and 
get his custom print display format automatically, even though we don’t have his 
Person class in our scope here. This works because when Python pickles a class instance, 
it records its self instance attributes, along with the name of the class it was created 
from and the module where the class lives. When bob is later fetched from the shelve 
and unpickled, Python will automatically reimport the class and link bob to it. 


The upshot of this scheme is that class instances automatically acquire all their class 
behavior when they are loaded in the future. We have to import our classes only to 
make new instances, not to process existing ones. Although a deliberate feature, this 
scheme has somewhat mixed consequences: 


e The downside is that classes and their module’s files must be importable when an 
instance is later loaded. More formally, pickleable classes must be coded at the top 
level of a module file accessible from a directory listed on the sys.path module 
search path (and shouldn’t live in the most script files’ module __main__ unless 
they’re always in that module when used). Because of this external module file 
requirement, some applications choose to pickle simpler objects such as dic- 
tionaries or lists, especially if they are to be transferred across the Internet. 
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e The upside is that changes in a class’s source code file are automatically picked up 
when instances of the class are loaded again; there is often no need to update stored 
objects themselves, since updating their class’s code changes their behavior. 


Shelves also have well-known limitations (the database suggestions at the end of this 
chapter mention a few of these). For simple object storage, though, shelves and pickles 
are remarkably easy-to-use tools. 


Updating Objects on a Shelve 


Now for one last script: let’s write a program that updates an instance (record) each 
time it runs, to prove the point that our objects really are persistent (i.e., that their 
current values are available every time a Python program runs). The following file, 
updatedb.py, prints the database and gives a raise to one of our stored objects each time. 
If you trace through what’s going on here, yov’ll notice that we’re getting a lot of utility 
“for free” —printing our objects automatically employs the general__str__ overloading 
method, and we give raises by calling the giveRaise method we wrote earlier. This all 
“just works” for objects based on OOP’s inheritance model, even when they live ina file: 


# File updatedb.py: update Person object on database 


import shelve 


db = shelve.open('persondb' ) # Reopen shelve with same filename 

for key in sorted(db): # Iterate to display database objects 
print(key, '\t=>', db[key]) # Prints with custom format 

sue = db['Sue Jones’ ] # Index by key to fetch 

sue. giveRaise(.10) # Update in memory using class method 

db['Sue Jones'] = sue # Assign to key to update in shelve 

db.close() # Close after making changes 


Because this script prints the database when it starts up, we have to run it a few times 
to see our objects change. Here it is in action, displaying all records and increasing 
sue’s pay each time it’s run (it’s a pretty good script for sue...): 


c:\misc> updatedb.py 


Bob Smith => [Person: job=None, name=Bob Smith, pay=0] 
Sue Jones => [Person: job=dev, name=Sue Jones, pay=100000] 
Tom Jones => [Manager: job=mgr, name=Tom Jones, pay=50000] 


c:\misc> updatedb.py 


Bob Smith => [Person: job=None, name=Bob Smith, pay=0] 
Sue Jones => [Person: job=dev, name=Sue Jones, pay=110000] 
Tom Jones => [Manager: job=mgr, name=Tom Jones, pay=50000] 


c:\misc> updatedb.py 


Bob Smith => [Person: job=None, name=Bob Smith, pay=0] 
Sue Jones => [Person: job=dev, name=Sue Jones, pay=121000] 
Tom Jones => [Manager: job=mgr, name=Tom Jones, pay=50000] 


c:\misc> updatedb. py 
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Bob Smith => [Person: job=None, name=Bob Smith, pay=0] 
Sue Jones => [Person: job=dev, name=Sue Jones, pay=133100] 
Tom Jones => [Manager: job=mgr, name=Tom Jones, pay=50000] 


Again, what we see here is a product of the shelve and pickle tools we get from Python, 
and of the behavior we coded in our classes ourselves. And once again, we can verify 
our script’s work at the interactive prompt (the shelve’s equivalent of a database client): 

c:\misc> python 

>>> import shelve 

>>> db = shelve.open('persondb' ) # Reopen database 

>>> rec = db['Sue Jones'] # Fetch object by key 

>>> print(rec) 

[Person: job=dev, name=Sue Jones, pay=146410] 

>>> rec. lastName() 

"Jones' 

>>> rec. pay 

146410 


For another example of object persistence in this book, see the sidebar in Chapter 30 
titled “Why You Will Care: Classes and Persistence” on page 744. It stores a some- 
what larger composite object in a flat file with pickle instead of shelve, but the effect 
is similar. For more details on both pickles and shelves, see other books or Python’s 
manuals. 


Future Directions 


And that’s a wrap for this tutorial. At this point, you’ve seen all the basics of Python’s 
OOP machinery in action, and you’ve learned ways to avoid redundancy and its asso- 
ciated maintenance issues in your code. You’ve built full-featured classes that do real 
work. As an added bonus, you’ve made them real database records by storing them in 
a Python shelve, so their information lives on persistently. 


There is much more we could explore here, of course. For example, we could extend 
our classes to make them more realistic, add new kinds of behavior to them, and so on. 
Giving a raise, for instance, should in practice verify that pay increase rates are between 
zero and one—an extension we'll add when we meet decorators later in this book. You 
might also mutate this example into a personal contacts database, by changing the state 
information stored on objects, as well as the class methods used to process it. We’ll 
leave this a suggested exercise open to your imagination. 


We could also expand our scope to use tools that either come with Python or are freely 
available in the open source world: 


GUIs 
As is, we can only process our database with the interactive prompt’s command- 
based interface, and scripts. We could also work on expanding our object data- 
base’s usability by adding a graphical user interface for browsing and updating its 
records. GUIs can be built portably with either Python’s tkinter (Tkinter in 2.6) 
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standard library support, or third-party toolkits such as WxPython and PyQt. 
tkinter ships with Python, lets you build simple GUIs quickly, and is ideal for 
learning GUI programming techniques; WxPython and PyQt tend to be more 
complex to use but often produce higher-grade GUIs in the end. 


Websites 

Although GUIs are convenient and fast, the Web is hard to beat in terms of acces- 
sibility. We might also implement a website for browsing and updating records, 
instead of or in addition to GUIs and the interactive prompt. Websites can be 
constructed with either basic CGI scripting tools that come with Python, or full- 
featured third-party web frameworks such as Django, TurboGears, Pylons, 
web2Py, Zope, or Google’s App Engine. On the Web, your data can still be stored 
in a shelve, pickle file, or other Python-based medium; the scripts that process it 
are simply run automatically ona server in response to requests from web browsers 
and other clients, and they produce HTML to interact with a user, either directly 
or by interfacing with Framework APIs. 


Web services 
Although web clients can often parse information in the replies from websites (a 
technique colorfully known as “screen scraping”), we might go further and provide 
a more direct way to fetch records on the Web via a web services interface such as 
SOAP or XML-RPC calls—APIs supported by either Python itself or the third-party 
open source domain. Such APIs return data in a more direct form, rather than 
embedded in the HTML of a reply page. 


Databases 

If our database becomes higher-volume or critical, we might eventually move it 
from shelves to a more full-featured storage mechanism such as the open source 
ZODB object-oriented database system (OODB), or a more traditional SQL-based 
relational database system such as MySQL, Oracle, PostgreSQL, or SQLite. Python 
itself comes with the in-process SQLite database system built-in, but other open 
source options are freely available on the Web. ZODB, for example, is similar to 
Python’s shelve but addresses many of its limitations, supporting larger databases, 
concurrent updates, transaction processing, and automatic write-through on in- 
memory changes. SQL-based systems like MySQL offer enterprise-level tools for 
database storage and may be directly used from a within a Python script. 


ORMs 

If we do migrate to a relational database system for storage, we don’t have to sac- 
rifice Python’s OOP tools. Object-relational mappers (ORMs) like SQLObject and 
SQLAlchemy can automatically map relational tables and rows to and from Python 
classes and instances, such that we can process the stored data using normal Python 
class syntax. This approach provides an alternative to OODBs like shelve and 
ZODB and leverages the power of both relational databases and Python’s class 
model. 
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While I hope this introduction whets your appetite for future exploration, all of these 
topics are of course far beyond the scope of this tutorial and this book at large. If you 
want to explore any of them on your own, see the Web, Python’s standard library 
manuals, and application-focused books such as Programming Python. In the latter I 
pick up this example where we’ve stopped here, showing how to add both a GUI and 
a website on top of the database to allow for browsing and updating instance records. 
I hope to see you there eventually, but first, let’s return to class fundamentals and finish 
up the rest of the core Python language story. 


Chapter Summary 


In this chapter, we explored all the fundamentals of Python classes and OOP in action, 
by building upon a simple but real example, step by step. We added constructors, 
methods, operator overloading, customization with subclasses, and introspection 
tools, and we met other concepts (such as composition, delegation, and polymorphism) 
along the way. 


In the end, we took objects created by our classes and made them persistent by storing 
them on a shelve object database—an easy-to-use system for saving and retrieving na- 
tive Python objects by key. While exploring class basics, we also encountered multiple 
ways to factor our code to reduce redundancy and minimize future maintenance costs. 
Finally, we briefly previewed ways to extend our code with application-programming 
tools such as GUIs and databases, covered in follow-up books. 


In the next chapters of this part of the book we’ll return to our study of the details 
behind Python’s class model and investigate its application to some of the design con- 
cepts used to combine classes in larger programs. Before we move ahead, though, let’s 
work through this chapter’s quiz to review what we covered here. Since we’ve already 
done a lot of hands-on work in this chapter, we’ll close with a set of mostly theory- 
oriented questions designed to make you trace through some of the code and ponder 
some of the bigger ideas behind it. 


Test Your Knowledge: Quiz 


1. When we fetch a Manager object from the shelve and print it, where does the display 
format logic come from? 

2. When we fetch a Person object from a shelve without importing its module, how 
does the object know that it has a giveRaise method that we can call? 


3. Why is it so important to move processing into methods, instead of hardcoding it 
outside the class? 
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4. Why is it better to customize by subclassing rather than copying the original and 
modifying? 

5. Why is it better to call back to a superclass method to run default actions, instead 
of copying and modifying its code in a subclass? 


6. Why is it better to use tools like _ dict__ that allow objects to be processed 
generically than to write more custom code for each type of class? 


7. Ingeneral terms, when might you choose to use object embedding and composition 
instead of inheritance? 


8. How might you modify the classes in this chapter to implement a personal contacts 
database in Python? 


Test Your Knowledge: Answers 


1. In the final version of our classes, Manager ultimately inherits its __str__ printing 
method from AttrDisplay in the separate classtools module. Manager doesn’t have 
one itself, so the inheritance search climbs to its Person superclass; because there 
isno__str__ there either, the search climbs higher and finds it in AttrDisplay. The 
class names listed in parentheses in a class statement’s header line provide 
the links to higher superclasses. 


2. Shelves (really, the pickle module they use) automatically relink an instance to the 
class it was created from when that instance is later loaded back into memory. 
Python reimports the class from its module internally, creates an instance with its 
stored attributes, and sets the instance’s class __ link to point to its original class. 
This way, loaded instances automatically obtain all their original methods (like 
lastName, giveRaise, and _str__), even if we have not imported the instance’s class 
into our scope. 


3. It’s important to move processing into methods so that there is only one copy to 
change in the future, and so that the methods can be run on any instance. This is 
Python’s notion of encapsulation—wrapping up logic behind interfaces, to better 
support future code maintenance. If you don’t do so, you create code redundancy 
that can multiply your work effort as the code evolves in the future. 


4. Customizing with subclasses reduces development effort. In OOP, we code by 
customizing what has already been done, rather than copying or changing existing 
code. This is the real “big idea” in OOP—because we can easily extend our prior 
work by coding new subclasses, we can leverage what we’ve already done. This is 
much better than either starting from scratch each time, or introducing multiple 
redundant copies of code that may all have to be updated in the future. 
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5. Copying and modifying code doubles your potential work effort in the future, re- 
gardless of the context. If a subclass needs to perform default actions coded in a 
superclass method, it’s much better to call back to the original through the super- 
class’s name than to copy its code. This also holds true for superclass constructors. 
Again, copying code creates redundancy, which is a major issue as code evolves. 


6. Generic tools can avoid hardcoded solutions that must be kept in sync with the 
rest of the class as it evolves over time. Ageneric__str__ print method, for example, 
need not be updated each time a new attribute is added to instances in an 
__init__ constructor. In addition, a generic print method inherited by all classes 
only appears, and need only be modified, in one place—changes in the generic 
version are picked up by all classes that inherit from the generic class. Again, elim- 
inating code redundancy cuts future development effort; that’s one of the primary 
assets classes bring to the table. 


7. Inheritance is best at coding extensions based on direct customization (like our 
Manager specialization of Person). Composition is well suited to scenarios where 
multiple objects are aggregated into a whole and directed by a controller layer class. 
Inheritance passes calls up to reuse, and composition passes down to delegate. 
Inheritance and composition are not mutually exclusive; often, the objects em- 
bedded in a controller are themselves customizations based upon inheritance. 


8. The classes in this chapter could be used as boilerplate “template” code to 
implement a variety of types of databases. Essentially, you can repurpose them by 
modifying the constructors to record different attributes and providing whatever 
methods are appropriate for the target application. For instance, you might use 
attributes such as name, address, birthday, phone, email, and so on for a contacts 
database, and methods appropriate for this purpose. A method named sendmail, 
for example, might use Python’s standard library smptlib module to send an email 
to one of the contacts automatically when called (see Python’s manuals or appli- 
cation-level books for more details on such tools). The AttrDisplay tool we wrote 
here could be used verbatim to print your objects, because it is intentionally ge- 
neric. Most of the shelve database code here can be used to store your objects, too, 
with minor changes. 
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CHAPTER 28 
Class Coding Details 


If you haven’t quite gotten all of Python OOP yet, don’t worry; now that we’ve had a 
quick tour, we’re going to dig a bit deeper and study the concepts introduced earlier in 
further detail. In this and the following chapter, we’ll take another look at class me- 
chanics. Here, we’re going to study classes, methods, and inheritance, formalizing and 
expanding on some of the coding ideas introduced in Chapter 26. Because the class is 
our last namespace tool, we’ll summarize Python’s namespace concepts here as well. 


The next chapter continues this in-depth second pass over class mechanics by covering 
one specific aspect: operator overloading. Besides presenting the details, this chapter 
and the next also give us an opportunity to explore some larger classes than those we 
have studied so far. 


The class Statement 


Although the Python class statement may seem similar to tools in other OOP languages 
on the surface, on closer inspection, it is quite different from what some programmers 
are used to. For example, as in C++, the class statement is Python’s main OOP tool, 
but unlike in C++, Python’s class is not a declaration. Like a def, a class statement is 
an object builder, and an implicit assignment—when run, it generates a class object 
and stores a reference to it in the name used in the header. Also like a def, a class 
statement is true executable code—your class doesn’t exist until Python reaches and 
runs the class statement that defines it (typically while importing the module it is coded 
in, but not before). 


General Form 


class isa compound statement, with a body of indented statements typically appearing 
under the header. In the header, superclasses are listed in parentheses after the class 
name, separated by commas. Listing more than one superclass leads to multiple in- 
heritance (which we’ll discuss more formally in Chapter 30). Here is the statement’s 
general form: 
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class <name>(superclass,...): # Assign to name 


data = value # Shared class data 
def method(self,...): # Methods 
self.member = value # Per-instance data 


Within the class statement, any assignments generate class attributes, and specially 
named methods overload operators; for instance, a function called __init__ is called 
at instance object construction time, if defined. 


Example 


As we’ve seen, classes are mostly just namespaces—that is, tools for defining names 
(i.e., attributes) that export data and logic to clients. So, how do you get from the 
class statement to a namespace? 


Here’s how. Just like in a module file, the statements nested in a class statement body 
create its attributes. When Python executes a class statement (not a call to a class), it 
runs all the statements in its body, from top to bottom. Assignments that happen during 
this process create names in the class’s local scope, which become attributes in the 
associated class object. Because of this, classes resemble both modules and functions: 


e Like functions, class statements are local scopes where names created by nested 
assignments live. 


e Like names in a module, names assigned in a class statement become attributes 
in a class object. 


The main distinction for classes is that their namespaces are also the basis of inheritance 
in Python; reference attributes that are not found ina class or instance object are fetched 
from other classes. 


Because class is a compound statement, any sort of statement can be nested inside its 
body—print, =, if, def, and so on. All the statements inside the class statement run 
when the class statement itself runs (not when the class is later called to make an 
instance). Assigning names inside the class statement makes class attributes, and 
nested defs make class methods, but other assignments make attributes, too. 


For example, assignments of simple nonfunction objects to class attributes produce 
data attributes, shared by all instances: 


>>> class SharedData: 


spam = 42 # Generates a class data attribute 
>>> x = SharedData() # Make two instances 
>>> y = SharedData() 
>>> X.Spam, y.spam # They inherit and share 'spam' 
(42, 42) 
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Here, because the name spam is assigned at the top level of a class statement, it is 

attached to the class and so will be shared by all instances. We can change it by going 

through the class name, and we can refer to it through either instances or the class.” 
>>> SharedData.spam = 99 


>>> X.Spam, y.spam, SharedData.spam 
(99, 99, 99) 


Such class attributes can be used to manage information that spans all the instances— 
a counter of the number of instances generated, for example (we’ll expand on this idea 
by example in Chapter 31). Now, watch what happens if we assign the name spam 
through an instance instead of the class: 

>>> x.spam = 88 


>>> X.Spam, y.spam, SharedData.spam 
(88, 99, 99) 


Assignments to instance attributes create or change the names in the instance, rather 
than in the shared class. More generally, inheritance searches occur only on attribute 
references, not on assignment: assigning to an object’s attribute always changes that 
object, and no other.t For example, y. spam is looked up in the class by inheritance, but 
the assignment to x.spam attaches a name to x itself. 


Here’s a more comprehensive example of this behavior that stores the same name in 
two places. Suppose we run the following class: 


class MixedNames: # Define class 
data = 'spam' # Assign class attr 
def _ init__(self, value): # Assign method name 
self.data = value # Assign instance attr 


def display(self): 
print(self.data, MixedNames.data) # Instance attr, class attr 


This class contains two defs, which bind class attributes to method functions. It also 
contains an = assignment statement; because this assignment assigns the name data 
inside the class, it lives in the class’s local scope and becomes an attribute of the class 
object. Like all class attributes, this data is inherited and shared by all instances of the 
class that don’t have data attributes of their own. 


When we make instances of this class, the name data is attached to those instances by 
the assignment to self.data in the constructor method: 


>>> x = MixedNames(1) # Make two instance objects 
>>> y = MixedNames(2) # Each has its own data 


* 


2a & 


Ifyou’ve used C++ you may recognize this as similar to the notion of C++’s “static” data members—members 
that are stored in the class, independent of instances. In Python, it’s nothing special: all class attributes are 
just names assigned in the class statement, whether they happen to reference functions (C++’s “methods”) 
or something else (C++’s “members”). In Chapter 31, we’ll also meet Python static methods (akin to those 
in C++), which are just self-less functions that usually process class attributes. 


t Unless the class has redefined the attribute assignment operation to do something unique with the 
__setattr__ operator overloading method (discussed in Chapter 29). 
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>>> x.display(); y.display() # self.data differs, MixedNames.data is the same 
1 spam 
2 spam 


The net result is that data lives in two places: in the instance objects (created by the 
self.data assignment in _init__), and in the class from which they inherit names 
(created by the data assignment in the class). The class’s display method prints both 
versions, by first qualifying the self instance, and then the class. 


By using these techniques to store attributes in different objects, we determine their 
scope of visibility. When attached to classes, names are shared; in instances, names 
record per-instance data, not shared behavior or data. Although inheritance searches 
look up names for us, we can always get to an attribute anywhere in a tree by accessing 
the desired object directly. 


In the preceding example, for instance, specifying x.data or self.data will return an 
instance name, which normally hides the same name in the class; however, Mixed 
Names. data grabs the class name explicitly. We’ll see various roles for such coding pat- 
terns later; the next section describes one of the most common. 


Methods 


Because you already know about functions, you also know about methods in classes. 
Methods are just function objects created by def statements nested in a class state- 
ment’s body. From an abstract perspective, methods provide behavior for instance 
objects to inherit. From a programming perspective, methods work in exactly the same 
way as simple functions, with one crucial exception: a method’s first argument always 
receives the instance object that is the implied subject of the method call. 


In other words, Python automatically maps instance method calls to class method 
functions as follows. Method calls made through an instance, like this: 


instance.method(args...) 


are automatically translated to class method function calls of this form: 


class.method(instance, args...) 


where the class is determined by locating the method name using Python’s inheritance 
search procedure. In fact, both call forms are valid in Python. 


Besides the normal inheritance of method attribute names, the special first argument 
is the only real magic behind method calls. In a class method, the first argument is 
usually called self by convention (technically, only its position is significant, not its 
name). This argument provides methods with a hook back to the instance that is the 
subject of the call—because classes generate many instance objects, they need to use 
this argument to manage data that varies per instance. 


684 | Chapter 28: Class Coding Details 


C++ programmers may recognize Python’s self argument as being similar to C++’s 
this pointer. In Python, though, self is always explicit in your code: methods must 
always go through self to fetch or change attributes of the instance being processed 
by the current method call. This explicit nature of self is by design—the presence of 
this name makes it obvious that you are using instance attribute names in your script, 
not names in the local or global scope. 


Method Example 


To clarify these concepts, let’s turn to an example. Suppose we define the following 
class: 


class NextClass: # Define class 
def printer(self, text): # Define method 
self.message = text # Change instance 
print (self.message) # Access instance 


The name printer references a function object; because it’s assigned in the class state- 
ment’s scope, it becomes a class object attribute and is inherited by every instance made 
from the class. Normally, because methods like printer are designed to process in- 
stances, we call them through instances: 


>>> x = NextClass() # Make instance 


>>> x.printer('instance call') # Call its method 
instance call 


>>> X.message # Instance changed 
‘instance call' 


When we call the method by qualifying an instance like this, printer is first located by 
inheritance, and then its self argument is automatically assigned the instance object 
(x); the text argument gets the string passed at the call (‘instance call"). Notice that 
because Python automatically passes the first argument to self for us, we only actually 
have to pass in one argument. Inside printer, the name self is used to access or set 
per-instance data because it refers back to the instance currently being processed. 


Methods may be called in one of two ways—through an instance, or through the class 
itself. For example, we can also call printer by going through the class name, provided 
we pass an instance to the self argument explicitly: 


>>> NextClass.printer(x, ‘class call') # Direct class call 

class call 

>>> X.message # Instance changed again 
"class call’ 
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Calls routed through the instance and the class have the exact same effect, as long as 
we pass the same instance object ourselves in the class form. By default, in fact, you get 
an error message if you try to call a method without any instance: 


>>> NextClass.printer('bad call') 
TypeError: unbound method printer() must be called with NextClass instance... 


Calling Superclass Constructors 


Methods are normally called through instances. Calls to methods through a class, 
though, do show up in a variety of special roles. One common scenario involves the 
constructor method. The _init__ method, like all attributes, is looked up by inheri- 
tance. This means that at construction time, Python locates and calls just one 
__init__. If subclass constructors need to guarantee that superclass construction-time 
logic runs, too, they generally must call the superclass’s _init_ method explicitly 
through the class: 
class Super: 
def init__(self, x): 
...default code... 


class Sub(Super): 
def init__(self, x, y): 


Super. init__(self, x) # Run superclass __init__ 
... Custom code... # Do my init actions 
I = Sub(1, 2) 


This is one of the few contexts in which your code is likely to call an operator over- 
loading method directly. Naturally, you should only call the superclass constructor this 
way if you really want it to run—without the call, the subclass replaces it completely. 
For a more realistic illustration of this technique in action, see the Manager class example 
in the prior chapter’s tutorial.+ 


Other Method Call Possibilities 


This pattern of calling methods through a class is the general basis of extending (instead 
of completely replacing) inherited method behavior. In Chapter 31, we’ll also meet a 
new option added in Python 2.2, static methods, that allow you to code methods that 
do not expect instance objects in their first arguments. Such methods can act like simple 
instanceless functions, with names that are local to the classes in which they are coded, 
and may be used to manage class data. A related concept, the class method, receives a 
class when called instead of an instance and can be used to manage per-class data. These 
are advanced and optional extensions, though; normally, you must always pass an 
instance to a method, whether it is called through an instance or a class. 


+ On a somewhat related note, you can also code multiple __init__ methods within the same class, but only 
the last definition will be used; see Chapter 30 for more details on multiple method definitions. 
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Inheritance 


The whole point of a namespace tool like the class statement is to support name in- 
heritance. This section expands on some of the mechanisms and roles of attribute in- 
heritance in Python. 


In Python, inheritance happens when an object is qualified, and it involves searching 
an attribute definition tree (one or more namespaces). Every time you use an expression 
of the form object .attr (where object is an instance or class object), Python searches 
the namespace tree from bottom to top, beginning with object, looking for the first 
attr it can find. This includes references to self attributes in your methods. Because 
lower definitions in the tree override higher ones, inheritance forms the basis of 
specialization. 


Attribute Tree Construction 


Figure 28-1 summarizes the way namespace trees are constructed and populated with 
names. Generally: 


e Instance attributes are generated by assignments to self attributes in methods. 
e Class attributes are created by statements (assignments) in class statements. 


e Superclass links are made by listing classes in parentheses in a class statement 
header. 


The net result is a tree of attribute namespaces that leads from an instance, to the class 
it was generated from, to all the superclasses listed in the class header. Python searches 
upward in this tree, from instances to superclasses, each time you use qualification to 
fetch an attribute name from an instance object.8 


Specializing Inherited Methods 


The tree-searching model of inheritance just described turns out to be a great way to 
specialize systems. Because inheritance finds names in subclasses before it checks su- 
perclasses, subclasses can replace default behavior by redefining their superclasses’ 
attributes. In fact, you can build entire systems as hierarchies of classes, which are 
extended by adding new external subclasses rather than changing existing logic 
in-place. 


§ This description isn’t 100% complete, because we can also create instance and class attributes by assigning 
to objects outside class statements—but that’s a much less common and sometimes more error-prone 
approach (changes aren’t isolated to class statements). In Python, all attributes are always accessible by 
default. We'll talk more about attribute name privacy in Chapter 29 when we study _ setattr_, in 
Chapter 30 when we meet _X names, and again in Chapter 38, where we’ll implement it with a class 
decorator. 
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The idea of redefining inherited names leads to a variety of specialization techniques. 
For instance, subclasses may replace inherited attributes completely, provide attributes 
that a superclass expects to find, and extend superclass methods by calling back to the 
superclass from an overridden method. We’ve already seen replacement in action. 
Here’s an example that shows how extension works: 

>>> class Super: 


def method(self): 
print('in Super.method' ) 


>>> class Sub(Super): 


def method(self): # Override method 
print('starting Sub.method') # Add actions here 
Super .method(self) # Run default action 


print('ending Sub.method' ) 


Objects : Program 


class X($1, $2): 
def attr(self,...): 


self.attr = V 


object.attr? 


i 
ie 


Figure 28-1. Program code creates a tree of objects in memory to be searched by attribute inheritance. 
Calling a class creates a new instance that remembers its class, running a class statement creates a 
new class, and superclasses are listed in parentheses in the class statement header. Each attribute 
reference triggers a new bottom-up tree search—even references to self attributes within a class’s 
methods. 


Direct superclass method calls are the crux of the matter here. The Sub class replaces 
Super’s method function with its own specialized version, but within the replacement, 
Sub calls back to the version exported by Super to carry out the default behavior. In 
other words, Sub.method just extends Super .method’s behavior, rather than replacing it 
completely: 
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>>> x = Super() # Make a Super instance 
>>> x.method() # Runs Super.method 
in Super.method 


>>> x = Sub() # Make a Sub instance 

>>> x.method() # Runs Sub.method, calls Super.method 
starting Sub.method 

in Super.method 

ending Sub.method 


This extension coding pattern is also commonly used with constructors; see the section 
“Methods” on page 684 for an example. 


Class Interface Techniques 


Extension is only one way to interface with a superclass. The file shown in this section, 
specialize.py, defines multiple classes that illustrate a variety of common techniques: 


Super 
i Defines a method function and a delegate that expects an action in a subclass. 

Inheritor 

Doesn’t provide any new names, so it gets everything defined in Super. 
Replacer 

Overrides Super’s method with a version of its own. 
Extender 

Customizes Super’s method by overriding and calling back to run the default. 
Provider 

Implements the action method expected by Super’s delegate method. 


Study each of these subclasses to get a feel for the various ways they customize their 
common superclass. Here’s the file: 


class Super: 
def method(self): 


print('in Super.method') # Default behavior 
def delegate(self): 
self.action() # Expected to be defined 
class Inheritor(Super): # Inherit method verbatim 
pass 
class Replacer (Super): # Replace method completely 


def method(self): 
print('in Replacer.method' ) 


class Extender (Super): # Extend method behavior 
def method(self): 
print('starting Extender.method' ) 
Super .method(self) 
print('ending Extender.method' ) 
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class Provider (Super): # Fill in a required method 
def action(self): 
print('in Provider.action' ) 


if _name__ == '' _main_': 
for klass in (Inheritor, Replacer, Extender): 
print('\n' + klass._name_ + '...') 


klass() .method() 
print('\nProvider...') 
x = Provider() 
x.delegate() 


A few things are worth pointing out here. First, the self-test code at the end of this 
example creates instances of three different classes in a for loop. Because classes are 
objects, you can put them in a tuple and create instances generically (more on this idea 
later). Classes also have the special _name__ attribute, like modules; it’s preset to a 
string containing the name in the class header. Here’s what happens when we run the 
file: 


% python specialize.py 


Inheritor... 
in Super.method 


Replacer... 
in Replacer.method 


Extender... 

starting Extender.method 
in Super.method 

ending Extender .method 


Provider... 
in Provider.action 


Abstract Superclasses 


Notice how the Provider class in the prior example works. When we call the 
delegate method through a Provider instance, two independent inheritance searches 
occur: 


1. On the initial x.delegate call, Python finds the delegate method in Super by 
searching the Provider instance and above. The instance x is passed into the 
method’s self argument as usual. 


2. Inside the Super.delegate method, self.action invokes a new, independent in- 
heritance search of self and above. Because self references a Provider instance, 
the action method is located in the Provider subclass. 


This “filling in the blanks” sort of coding structure is typical of OOP frameworks. At 
least in terms of the delegate method, the superclass in this example is what is some- 
times called an abstract superclass—a class that expects parts of its behavior to be 
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provided by its subclasses. If an expected method is not defined in a subclass, Python 
raises an undefined name exception when the inheritance search fails. 


Class coders sometimes make such subclass requirements more obvious with assert 
statements, or by raising the built-in NotImplementedError exception with raise state- 
ments (we'll study statements that may trigger exceptions in depth in the next part of 
this book). As a quick preview, here’s the assert scheme in action: 
class Super: 
def delegate(self): 
self.action() 


def action(self): 
assert False, ‘action must be defined! ' # If this version is called 


>>> X = Super() 
>>> X.delegate() 
AssertionError: action must be defined! 


We'll meet assert in Chapters 32 and 33; in short, if its first expression evaluates 
to false, it raises an exception with the provided error message. Here, the expression 
is always false so as to trigger an error message if a method is not redefined, and in- 
heritance locates the version here. Alternatively, some classes simply raise a 
NotImplementedError exception directly in such method stubs to signal the mistake: 
class Super: 
def delegate(self): 
self.action() 


def action(self): 
raise NotImplementedError(‘action must be defined!') 


>>> X = Super() 
>>> X.delegate() 
NotImplementedError: action must be defined! 


For instances of subclasses, we still get the exception unless the subclass provides the 
expected method to replace the default in the superclass: 

>>> class Sub(Super): pass 

>>> X = Sub() 


>>> X.delegate() 
NotImplementedError: action must be defined! 


>>> class Sub(Super): 
def action(self): print('spam' ) 


>>> X = Sub() 
>>> X.delegate() 
spam 


For a somewhat more realistic example of this section’s concepts in action, see the “Zoo 
animal hierarchy” exercise (exercise 8) at the end of Chapter 31, and its solution in 
“Part VI, Classes and OOP” on page 1122 in Appendix B. Such taxonomies are a 
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traditional way to introduce OOP, but they’ re a bit removed from most developers’ job 
descriptions. 


Python 2.6 and 3.0 Abstract Superclasses 


As of Python 2.6 and 3.0, the prior section’s abstract superclasses (a.k.a. “abstract base 
classes”), which require methods to be filled in by subclasses, may also be implemented 
with special class syntax. The way we code this varies slightly depending on the version. 
In Python 3.0, we use a keyword argument in a class header, along with special @ 
decorator syntax, both of which we’ll study in detail later in this book: 


from abc import ABCMeta, abstractmethod 


class Super (metaclass=ABCMeta) : 
@abstractmethod 
def method(self, ...): 
pass 


But in Python 2.6, we use a class attribute instead: 


class Super: 
__metaclass_ = ABCMeta 
@abstractmethod 
def method(self, ...): 
pass 


Either way, the effect is the same—we can’t make an instance unless the method is 
defined lower in the class tree. In 3.0, for example, here is the special syntax equivalent 
of the prior section’s example: 


>>> from abc import ABCMeta, abstractmethod 
>>> 
>>> class Super(metaclass=ABCMeta) : 
def delegate(self): 
self .action() 
@abstractmethod 
def action(self): 
pass 


>>> X = Super() 
TypeError: Can't instantiate abstract class Super with abstract methods action 


>>> class Sub(Super): pass 


>>> X = Sub() 
TypeError: Can't instantiate abstract class Sub with abstract methods action 


>>> class Sub(Super): 
def action(self): print('spam' ) 


>>> X = Sub() 
>>> X.delegate() 
spam 
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Coded this way, a class with an abstract method cannot be instantiated (that is, we 
cannot create an instance by calling it) unless all of its abstract methods have been 
defined in subclasses. Although this requires more code, the advantage of this approach 
is that errors for missing methods are issued when we attempt to make an instance of 
the class, not later when we try to call a missing method. This feature may also be used 
to define an expected interface, automatically verified in client classes. 


Unfortunately, this scheme also relies on two advanced language tools we have not met 
yet—function decorators, introduced in Chapter 31 and covered in depth in Chap- 
ter 38, as well as metaclass declarations, mentioned in Chapter 31 and covered in 
Chapter 39—so we will finesse other facets of this option here. See Python’s standard 
manuals for more on this, as well as precoded abstract superclasses Python provides. 


Namespaces: The Whole Story 


Now that we’ve examined class and instance objects, the Python namespace story is 
complete. For reference, I'll quickly summarize all the rules used to resolve names here. 
The first things you need to remember are that qualified and unqualified names are 
treated differently, and that some scopes serve to initialize object namespaces: 


e Unqualified names (e.g., X) deal with scopes. 
e Qualified attribute names (e.g., object .X) use object namespaces. 


e Some scopes initialize object namespaces (for modules and classes). 


Simple Names: Global Unless Assigned 


Unqualified simple names follow the LEGB lexical scoping rule outlined for functions 
in Chapter 17: 


Assignment (X = value) 
Makes names local: creates or changes the name X in the current local scope, unless 
declared global. 

Reference (X) 
Looks for the name X in the current local scope, then any and all enclosing func- 
tions, then the current global scope, then the built-in scope. 


Attribute Names: Object Namespaces 


Qualified attribute names refer to attributes of specific objects and obey the rules for 
modules and classes. For class and instance objects, the reference rules are augmented 
to include the inheritance search procedure: 
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Assignment (object .X = value) 
Creates or alters the attribute name X in the namespace of the object being quali- 
fied, and none other. Inheritance-tree climbing happens only on attribute refer- 
ence, not on attribute assignment. 


Reference (object .X) 
For class-based objects, searches for the attribute name X in object, then in all 
accessible classes above it, using the inheritance search procedure. For nonclass 
objects such as modules, fetches X from object directly. 


The “Zen” of Python Namespaces: Assignments Classify Names 


With distinct search procedures for qualified and unqualified names, and multiple 
lookup layers for both, it can sometimes be difficult to tell where a name will wind up 
going. In Python, the place where you assign a name is crucial—it fully determines the 
scope or object in which a name will reside. The file manynames.py illustrates how this 
principle translates to code and summarizes the namespace ideas we have seen through- 
out this book: 


# manynames.py 


X = 11 # Global (module) name/attribute (X, or manynames.X) 
def f(): 
print(X) # Access global X (11) 
def g(): 
X = 22 # Local (function) variable (X, hides module X) 
print (X) 
class C: 
X = 33 # Class attribute (C.X) 
def m(self): 
X = 44 # Local variable in method (X) 
self.X = 55 # Instance attribute (instance.X) 


This file assigns the same name, X, five times. Because this name is assigned in five 
different locations, though, all five Xs in this program are completely different variables. 
From top to bottom, the assignments to X here generate: a module attribute (11), a local 
variable in a function (22), a class attribute (33), a local variable in a method (44), and 
an instance attribute (55). Although all five are named X, the fact that they are all as- 
signed at different places in the source code or to different objects makes all of these 
unique variables. 


You should take the time to study this example carefully because it collects ideas we’ve 
been exploring throughout the last few parts of this book. When it makes sense to you, 
you will have achieved a sort of Python namespace nirvana. Of course, an alternative 
route to nirvana is to simply run the program and see what happens. Here’s the re- 
mainder of this source file, which makes an instance and prints all the Xs that it can fetch: 
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# manynames.py, continued 


if _name_ == '' main_': 
print (X) # 11: module (a.k.a. manynames.X outside file) 
f() # 11: global 
g() # 22: local 
print (X) # 11: module name unchanged 
obj = C() # Make instance 
print (obj.X) # 33: class name inherited by instance 
obj.m() # Attach attribute name X to instance now 
print (obj.X) # 55: instance 
print(C.X) # 33: class (a.k.a. obj.X if no X in instance) 
#print(C.m.X) # FAILS: only visible in method 
#print(g.X) # FAILS: only visible in function 


The outputs that are printed when the file is run are noted in the comments in the code; 
trace through them to see which variable named X is being accessed each time. Notice 
in particular that we can go through the class to fetch its attribute (C.X), but we can 
never fetch local variables in functions or methods from outside their def statements. 
Locals are visible only to other code within the def, and in fact only live in memory 
while a call to the function or method is executing. 


Some of the names defined by this file are visible outside the file to other modules, but 
recall that we must always import before we can access names in another file—that is 
the main point of modules, after all: 


# otherfile.py 


import manynames 


X = 66 

print(X) # 66: the global here 

print (manynames .X) # 11: globals become attributes after imports 
manynames.() # 11: manynames's X, not the one here! 
manynames.g() # 22: local in other file's function 

print (manynames.C.X) # 33: attribute of class in other module 

I = manynames.C() 

print(I.X) # 33: still from class here 

I.m() 

print(I.X) # 55: now from instance! 


Notice here how manynames. f () prints the X in manynames, not the X assigned in this file— 
scopes are always determined by the position of assignments in your source code (i.e., 
lexically) and are never influenced by what imports what or who imports whom. Also, 
notice that the instance’s own X is not created until we call I.m()—attributes, like all 
variables, spring into existence when assigned, and not before. Normally we create 
instance attributes by assigning them in class __init__ constructor methods, but this 
isn’t the only option. 
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Finally, as we learned in Chapter 17, it’s also possible for a function to change names 
outside itself, with global and (in Python 3.0) nonlocal statements—these statements 
provide write access, but also modify assignment’s namespace binding rules: 


X = 11 # Global in module 
def g1(): 
print (X) # Reference global in module 
def g2(): 
global X 
X = 22 # Change global in module 
def h1(): 
X = 33 # Local in function 
def nested(): 
print(X) # Reference local in enclosing scope 
def h2(): 
X = 33 # Local in function 
def nested(): 
nonlocal X # Python 3.0 statement 
X = 44 # Change local in enclosing scope 


Of course, you generally shouldn’t use the same name for every variable in your script— 
but as this example demonstrates, even if you do, Python’s namespaces will work to 
keep names used in one context from accidentally clashing with those used in another. 


Namespace Dictionaries 


In Chapter 22, we learned that module namespaces are actually implemented as dic- 
tionaries and exposed with the built-in __dict__ attribute. The same holds for class and 
instance objects: attribute qualification is really a dictionary indexing operation inter- 
nally, and attribute inheritance is just a matter of searching linked dictionaries. In fact, 
instance and class objects are mostly just dictionaries with links inside Python. Python 
exposes these dictionaries, as well as the links between them, for use in advanced roles 
(e.g., for coding tools). 


To help you understand how attributes work internally, let’s work through an inter- 
active session that traces the way namespace dictionaries grow when classes are in- 
volved. We saw a simpler version of this type of code in Chapter 26, but now that we 
know more about methods and superclasses, let’s embellish it here. First, let’s define 
a superclass and a subclass with methods that will store data in their instances: 

>>> class super: 


def hello(self): 
self.data1 = 'spam' 


>>> class sub(super): 
def hola(self): 
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self.data2 = 'eggs' 


When we make an instance of the subclass, the instance starts out with an empty 
namespace dictionary, but it has links back to the class for the inheritance search to 
follow. In fact, the inheritance tree is explicitly available in special attributes, which 
you can inspect. Instanceshavea__class attribute that links to their class, and classes 
have a _bases__ attribute that is a tuple containing links to higher superclasses (I’m 
running this on Python 3.0; name formats and some internal attributes vary slightly in 
2.6): 


>>> X = sub() 

>>> X.__dict__ # Instance namespace dict 
{} 

>>> X.__class__ # Class of instance 
<class '__main_.sub'> 

>>> sub. bases __ # Superclasses of class 
(<class '__main_.super'>,) 


>>> super. _bases__ # () empty tuple in Python 2.6 
(<class 'object'>,) 


As classes assign to self attributes, they populate the instance objects—that is, at- 
tributes wind up in the instances’ attribute namespace dictionaries, not in the classes’. 
An instance object’s namespace records data that can vary from instance to instance, 
and self is a hook into that namespace: 


>>> Y = sub() 


>>> X.hello() 
>>> X.__dict__ 
{'data1': 'spam'} 


>>> X.hola() 
>>> X.__dict__ 
{'data1': 'spam', 'data2': 'eggs'} 


>>> sub. dict__.keys() 
['__module_', '_doc_', ‘hola'] 


>>> super. dict__.keys() 
['_dict_', '_module_', '_weakref_', ‘hello', '  doc_'] 


>>> Y.__dict__ 


{} 


Notice the extra underscore names in the class dictionaries; Python sets these auto- 
matically. Most are not used in typical programs, but there are tools that use some of 
them (e.g., _doc__ holds the docstrings discussed in Chapter 15). 
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Also, observe that Y, a second instance made at the start of this series, still has an empty 
namespace dictionary at the end, even though X’s dictionary has been populated by 
assignments in methods. Again, each instance has an independent namespace dic- 
tionary, which starts out empty and can record completely different attributes than 
those recorded by the namespace dictionaries of other instances of the same class. 


Because attributes are actually dictionary keys inside Python, there are really two ways 
to fetch and assign their values—by qualification, or by key indexing: 


>>> X.datai, X.__dict__['data1'] 
(‘spam', ‘spam’ ) 


>>> X.data3 = ‘toast’ 
>>> X.__dict__ 
{'data1': 'spam', 'data3': 'toast', 'data2': ‘eggs'} 


>>> X.__dict__['data3'] = 'ham' 
>>> X.data3 
"ham' 


This equivalence applies only to attributes actually attached to the instance, though. 
Because attribute fetch qualification also performs an inheritance search, it can access 
attributes that namespace dictionary indexing cannot. The inherited attribute 
X.hello, for instance, cannot be accessed by X.__dict__["hello']. 


Finally, here is the built-in dir function we met in Chapters 4 and 15 at work on class 
and instance objects. This function works on anything with attributes: dir(object) is 
similar to an object.__dict__.keys() call. Notice, though, that dir sorts its list and 
includes some system attributes. As of Python 2.2, dir also collects inherited attributes 
automatically, and in 3.0 it includes names inherited from the object class that is an 
implied superclass of all classes: 


>>> X.__dict__, Y._ dict__ 

({'data1': 'spam', ‘data3': 'ham', 'data2': 'eggs'}, {}) 

>>> list(X.__dict__.keys()) # Need list in 3.0 
['data1', 'data3', ‘data2'] 


# In Python 2.6: 


>>>> dir(X) 

['_doc_', '_module_', ‘data1', 'data2', ‘data3', 'hello', 'hola'] 
>>> dir(sub) 

['_doc_', '_module_', ‘hello’, ‘hola'] 

>>> dir(super) 

['_doc_', '_module_', ‘hello'] 


|| As you can see, the contents of attribute dictionaries and dir call results may change over time. For example, 
because Python now allows built-in types to be subclassed like classes, the contents of dir results for built- 
in types have expanded to include operator overloading methods, just like our dir results here for user-defined 
classes under Python 3.0. In general, attribute names with leading and trailing double underscores are 
interpreter-specific. Type subclasses will be discussed further in Chapter 31. 
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# In Python 3.0: 


>>> dir(X) 

['_class_', '_delattr_', '_dict_', '_doc_', '_eq_', '_ format_', 
...more omitted... 

'data1', ‘data2', 'data3', 'hello', 'hola'] 


>>> dir(sub) 

['_class_', '_delattr_', '_dict_', '_doc_', '_eq_', '__format_', 
...more omitted... 

"hello', ‘hola'] 


>>> dir(super) 


['_class_', '_delattr_', '_dict_', '_doc_', '_eq_', '_ format_', 
...more omitted... 

"hello' 

] 


Experiment with these special attributes on your own to get a better feel for how name- 
spaces actually do their attribute business. Even if you will never use these in the kinds 
of programs you write, seeing that they are just normal dictionaries will help demystify 
the notion of namespaces in general. 


Namespace Links 


The prior section introduced the special _class__ and _ bases ___ instance and class 
attributes, without really explaining why you might care about them. In short, these 
attributes allow you to inspect inheritance hierarchies within your own code. For ex- 
ample, they can be used to display a class tree, as in the following example: 


# classtree.py 


nnn 


Climb inheritance trees using namespace links, 
displaying higher superclasses with indentation 


nnn 


def classtree(cls, indent): 
print('.' * indent + cls. __name__) # Print class name here 
for supercls in cls.__bases_: # Recur to all superclasses 
classtree(supercls, indent+3) # May visit super > once 


def instancetree(inst): 
print('Tree of %s' % inst) # Show instance 
classtree(inst._class__, 3) # Climb to its class 


def selftest(): 
class A: pass 
class B(A): pass 
class C(A): pass 
class D(B,C): pass 
class E: pass 
class F(D,E): pass 
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instancetree(B()) 
instancetree(F()) 


if _mame_ == '_main_': selftest() 


The classtree function in this script is recursive—it prints a class’s name using 
__name__, then climbs up to the superclasses by calling itself. This allows the function 
to traverse arbitrarily shaped class trees; the recursion climbs to the top, and stops at 
root superclasses that have empty _ bases__ attributes. When using recursion, each 
active level of a function gets its own copy of the local scope; here, this means that 
cls and indent are different at each classtree level. 


Most of this file is self-test code. When run standalone in Python 3.0, it builds an empty 
class tree, makes two instances from it, and prints their class tree structures: 


C:\misc> c:\python26\python classtree.py 
Tree of <_main_.B instance at 0x02557328> 


When run under Python 3.0, the tree includes the implied object superclasses that are 
automatically added above standalone classes, because all classes are “new style” in 3.0 
(more on this change in Chapter 31): 


C:\misc> c:\python30\python classtree.py 
Tree of <_main_.B object at 0x02810650> 


Here, indentation marked by periods is used to denote class tree height. Of course, we 
could improve on this output format, and perhaps even sketch it in a GUI display. Even 
as is, though, we can import these functions anywhere we want a quick class tree 
display: 
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C:\misc> c:\python30\python 
>>> class Emp: pass 


>>> class Person(Emp): pass 
>>> bob = Person() 


>>> import classtree 

>>> classtree.instancetree(bob) 

Tree of <_main__.Person object at 0x028203B0> 
...Person 


Regardless of whether you will ever code or use such tools, this example demonstrates 
one of the many ways that you can make use of special attributes that expose interpreter 
internals. You'll see another when we code the lister.py general-purpose class display 
tools in the section “Multiple Inheritance: “Mix-in” Classes” on page 756—there, we 
will extend this technique to also display attributes in each object in a class tree. And 
in the last part of this book, we’ll revisit such tools in the context of Python tool building 
at large, to code tools that implement attribute privacy, argument validation, and more. 
While not for every Python programmer, access to internals enables powerful devel- 
opment tools. 


Documentation Strings Revisited 


The last section’s example includes a docstring for its module, but remember that doc- 
strings can be used for class components as well. Docstrings, which we covered in detail 
in Chapter 15, are string literals that show up at the top of various structures and are 
automatically saved by Python in the corresponding objects’ __doc___ attributes. This 
works for module files, function defs, and classes and methods. 


Now that we know more about classes and methods, the following file, docstr.py, pro- 
vides a quick but comprehensive example that summarizes the places where docstrings 
can show up in your code. All of these can be triple-quoted blocks: 


" 


"I am: docstr.doc__ 


def func(args): 
"I am: docstr.func. doc__ 
pass 


" 


class spam: 
"I am: spam. doc__ or docstr.spam. doc __ 
def method(self, arg): 
"I am: spam.method. doc_ or self.method. doc _ 
pass 
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The main advantage of documentation strings is that they stick around at runtime. 
Thus, if it’s been coded as a docstring, you can qualify an object with its __doc___at- 
tribute to fetch its documentation: 

>>> import docstr 


>>> docstr._ doc __ 
"I am: docstr. doc __ 


>>> docstr.func. doc __ 
"I am: docstr.func. doc __ 


>>> docstr.spam.__doc__ 
"I am: spam. doc_ or docstr.spam._doc __ 


>>> docstr.spam.method. doc __ 
"I am: spam.method. doc__ or self.method. doc __ 


A discussion of the PyDoc tool, which knows how to format all these strings in reports, 
appears in Chapter 15. Here it is running on our code under Python 2.6 (Python 3.0 
shows additional attributes inherited from the implied object superclass in the new- 
style class model—run this on your own to see the 3.0 extras, and watch for more about 
this difference in Chapter 31): 


>>> help(docstr) 
Help on module docstr: 


NAME 
docstr - I am: docstr._ doc __ 


FILE 
c:\misc\docstr.py 


CLASSES 
spam 


class spam 
| I am: spam. doc_ or docstr.spam. doc __ 


Methods defined here: 


| 

| 

| method(self, arg) 

| I am: spam.method._doc__ or self.method. doc __ 


FUNCTIONS 
func (args) 
I am: docstr.func. doc __ 


Documentation strings are available at runtime, but they are less flexible syntactically 
than # comments (which can appear anywhere in a program). Both forms are useful 
tools, and any program documentation is good (as long as it’s accurate, of course!). As 
a best-practice rule of thumb, use docstrings for functional documentation (what your 
objects do) and hash-mark comments for more micro-level documentation (how arcane 
expressions work). 
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Classes Versus Modules 


Let’s wrap up this chapter by briefly comparing the topics of this book’s last two parts: 
modules and classes. Because they’re both about namespaces, the distinction can be 
confusing. In short: 
e Modules 
— Are data/logic packages 
— Are created by writing Python files or C extensions 
— Are used by being imported 
* Classes 
—Implement new objects 
— Are created by class statements 
— Are used by being called 
— Always live within a module 
Classes also support extra features that modules don’t, such as operator overloading, 


multiple instance generation, and inheritance. Although both classes and modules are 
namespaces, you should be able to tell by now that they are very different things. 


Chapter Summary 


This chapter took us on a second, more in-depth tour of the OOP mechanisms of the 
Python language. We learned more about classes, methods, and inheritance, and we 
wrapped up the namespace story in Python by extending it to cover its application to 
classes. Along the way, we looked at some more advanced concepts, such as abstract 
superclasses, class data attributes, namespace dictionaries and links, and manual calls 
to superclass methods and constructors. 


Now that we’ve learned all about the mechanics of coding classes in Python, Chap- 
ter 29 turns to a specific facet of those mechanics: operator overloading. After that we’ll 
explore common design patterns, looking at some of the ways that classes are com- 
monly used and combined to optimize code reuse. Before you read ahead, though, be 
sure to work though the usual chapter quiz to review what we’ve covered here. 


Test Your Knowledge: Quiz 


1. What is an abstract superclass? 


2. What happens when a simple assignment statement appears at the top level of a 
class statement? 
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3. Why might a class need to manually call the _init__ method ina superclass? 


4. How can you augment, instead of completely replacing, an inherited method? 


5. What...was the capital of Assyria? 


Test Your Knowledge: Answers 


1. 


An abstract superclass is a class that calls a method, but does not inherit or define 
it—it expects the method to be filled in by a subclass. This is often used as a way 
to generalize classes when behavior cannot be predicted until a more specific sub- 
class is coded. OOP frameworks also use this as a way to dispatch to client-defined, 
customizable operations. 


. When a simple assignment statement (X = Y) appears at the top level of a class 


statement, it attaches a data attribute to the class (Class .X). Like all class attributes, 
this will be shared by all instances; data attributes are not callable method func- 
tions, though. 


. A class must manually call the _init__ method in a superclass if it defines an 


__init__ constructor of its own, but it also must still kick off the superclass’s con- 
struction code. Python itself automatically runs just one constructor—the lowest 
one in the tree. Superclass constructors are called through the class name, passing 
in the self instance manually: Superclass. init__(self, ...). 


. To augment instead of completely replacing an inherited method, redefine it in a 


subclass, but call back to the superclass’s version of the method manually from the 
new version of the method in the subclass. That is, pass the self instance to the 
superclass’s version of the method manually: Superclass.method(self, ...). 


. Ashur (or Qalat Sherqat), Calah (or Nimrud), the short-lived Dur Sharrukin (or 


Khorsabad), and finally Nineveh. 
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CHAPTER 29 
Operator Overloading 


This chapter continues our in-depth survey of class mechanics by focusing on operator 
overloading. We looked briefly at operator overloading in prior chapters; here, we’ll 
fill in more details and look at a handful of commonly used overloading methods. 
Although we won’t demonstrate each of the many operator overloading methods avail- 
able, those we will code here are a representative sample large enough to uncover the 
possibilities of this Python class feature. 


The Basics 


Really “operator overloading” simply means intercepting built-in operations in class 
methods—Python automatically invokes your methods when instances of the class 
appear in built-in operations, and your method’s return value becomes the result of the 
corresponding operation. Here’s a review of the key ideas behind overloading: 


e Operator overloading lets classes intercept normal Python operations. 
* Classes can overload all Python expression operators. 


e Classes can also overload built-in operations such as printing, function calls, at- 
tribute access, etc. 


e Overloading makes class instances act more like built-in types. 


* Overloading is implemented by providing specially named class methods. 


In other words, when certain specially named methods are provided in a class, Python 
automatically calls them when instances of the class appear in their associated expres- 
sions. As we’ve learned, operator overloading methods are never required and generally 
don’t have defaults; if you don’t code or inherit one, it just means that your class does 
not support the corresponding operation. When used, though, these methods allow 
classes to emulate the interfaces of built-in objects, and so appear more consistent. 
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Constructors and Expressions: __init__ and __sub__ 


Consider the following simple example: its Number class, coded in the file number.py, 
provides a method to intercept instance construction (__init__), as well as one for 
catching subtraction expressions (__sub__). Special methods such as these are the hooks 
that let you tie into built-in operations: 
class Number: 
def init__(self, start): 
self.data = start 


def _sub_ (self, other): 
return Number(self.data - other) 


# On Number(start) 


# On instance - other 
# Result is a new instance 


>>> from number import Number 
>>> X = Number(5) 


# Fetch class from module 
# Number.__init__(X, 5) 


>> Y=X-2 
>>> Y.data 


# Number.__sub__(X, 2) 
# Y is new Number instance 


3 


As discussed previously, the __init__ constructor method seen in this code is the most 
commonly used operator overloading method in Python; it’s present in most classes. 
In this chapter, we will tour some of the other tools available in this domain and look 
at example code that applies them in common use cases. 


Common Operator Overloading Methods 


Just about everything you can do to built-in objects such as integers and lists has a 
corresponding specially named method for overloading in classes. Table 29-1 lists a 
few of the most common; there are many more. In fact, many overloading methods 
come in multiple versions (e.g., _add__, _radd_,and__iadd__ for addition), which 
is one reason there are so many. See other Python books, or the Python language ref- 
erence manual, for an exhaustive list of the special method names available. 


Table 29-1. Common operator overloading methods 


Method Implements Called for 

_init_ Constructor Object creation:X = Class (args) 
_del_ Destructor Object reclamation of X 

_add_ Operator + X + Y,X += Yifno __iadd_ 
-o Operator | (bitwise OR) X | Y,X |= Yifno__ior_ 
_repr_,_str_ Printing, conversions print(X), repr(X), str(X) 
_ call __ Function calls X(*args, **kargs) 
__getattr__ Attribute fetch X.undefined 

__setattr__ Attribute assignment X.any = value 

_ delattr__ Attribute deletion del X.any 
__getattribute__ Attribute fetch X.any 
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Method Implements Called for 


__getitem_ Indexing, slicing, iteration X[ key], X[i:j], for loops and other iterations if no 
_iter_ 

__setitem_ Index and slice assignment X[key] = value, X[i:j] = sequence 

__delitem_ Index and slice deletion del X[key],del X[i:j] 

—_len_ Length len(X), truth testsifno bool __ 

__ bool __ Boolean tests bool (X), truth tests (named nonzero ___in2.6) 

_lt_,_ gt, Comparisons X < Y,X > Y,X <= Y,X >= Y,X == Y,X != Y(or 

_le , ge, else __cmp__ in 2.6 only) 

eq _ne_ 

_rYadd_ Right-side operators Other + X 

__iadd_ In-place augmented operators X += Y(orelse__add_) 

_iter_, _next_ Iteration contexts I=iter(X),next(I); for loops, in ifno 


__ contains __,allcomprehensions, map (F , X), others 
(__next__isnamed next in 2.6) 


__contains _ Membership test item in X (any iterable) 
__index__ Integer value hex(X), bin(X), oct (X), O[X], O[X: ] (replaces Py- 
thon2__oct_, hex_) 
_enter_, exit Context manager (Chapter 33) with obj as var: 
_get_,_ set, Descriptor attributes (Chapter 37) X.attr, X.attr = value, del X.attr 
_ delete __ 
new Creation (Chapter 39) Object creation, before init__ 


All overloading methods have names that start and end with two underscores to keep 
them distinct from other names you define in your classes. The mappings from special 
method names to expressions or operations are predefined by the Python language (and 
documented in the standard language manual). For example, the name __add__ always 
maps to + expressions by Python language definition, regardless of what an __add__ 
method’s code actually does. 


Operator overloading methods may be inherited from superclasses if not defined, just 
like any other methods. Operator overloading methods are also all optional—if you 
don’t code or inherit one, that operation is simply unsupported by your class, and 
attempting it will raise an exception. Some built-in operations, like printing, have de- 
faults (inherited for the implied object class in Python 3.0), but most built-ins fail for 
class instances if no corresponding operator overloading method is present. 


Most overloading methods are used only in advanced programs that require objects to 
behave like built-ins; the _init__ constructor tends to appear in most classes, however, 
so pay special attention to it. We’ve already met the _ init__ initialization-time con- 
structor method, and a few of the others in Table 29-1. Let’s explore some of the ad- 
ditional methods in the table by example. 
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Indexing and Slicing: __getitem__ and __setitem__ 


If defined in a class (or inherited by it), the __ getitem_ method is called automatically 
for instance-indexing operations. When an instance X appears in an indexing expression 
like X[i], Python calls the __getitem_ method inherited by the instance, passing X to 
the first argument and the index in brackets to the second argument. For example, the 
following class returns the square of an index value: 

>>> class Indexer: 


def __getitem_(self, index): 
return index ** 2 


>>> X = Indexer() 


>>> X[2] # X[i] calls X.__getitem__(i) 
4 
>>> for i in range(5): 
print(X[i], end=' ') # Runs __getitem__(X, i) each time 
014916 
Intercepting Slices 


Interestingly, in addition to indexing, _ getitem__is also called for slice expressions. 
Formally speaking, built-in types handle slicing the same way. Here, for example, is 
slicing at work on a built-in list, using upper and lower bounds and a stride (see Chap- 
ter 7 if you need a refresher on slicing): 


>>> L = [5, 6, 7, 8, 9] 

>>> L[2:4] # Slice with slice syntax 
[7, 8] 

>>> L[a:] 

[6, 7, 8, 9] 

>>> L[:-1] 

[5, 6, 7, 8] 

>>> L[::2] 

[5, 7, 9] 


Really, though, slicing bounds are bundled up into a slice object and passed to the list’s 
implementation of indexing. In fact, you can always pass a slice object manually—slice 
syntax is mostly syntactic sugar for indexing with a slice object: 


>>> L[slice(2, 4)] # Slice with slice objects 
[7, 8] 

>>> L[slice(1, None)] 

[6, 7, 8, 9] 

>>> L[slice(None, -1)] 

[5, 6, 7, 8] 

>>> L[slice(None, None, 2)] 

[5, 7, 9] 
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This matters in classes with a__getitem__ method—the method will be called both for 
basic indexing (with an index) and for slicing (with a slice object). Our previous class 
won't handle slicing because its math assumes integer indexes are passed, but the fol- 
lowing class will. When called for indexing, the argument is an integer as before: 


>>> class Indexer: 
data = [5, 6, 7, 8, 9] 
def _ getitem_(self, index): # Called for index or slice 
print('getitem:', index) 
return self.data[index] # Perform index or slice 


>>> X = Indexer() 

>>> X[0] # Indexing sends __getitem__ an integer 
getitem: 0 

5 

>>> X[1] 

getitem: 1 

6 

>>> X[-1] 

getitem: -1 

9 


When called for slicing, though, the method receives a slice object, which is simply 
passed along to the embedded list indexer in a new index expression: 


>>> X[2:4] # Slicing sends __getitem__a slice object 
getitem: slice(2, 4, None) 

[7, 8] 

>>> X[1:] 

getitem: slice(1, None, None) 
[6, 7, 8, 9] 

>>> X[:-1] 

getitem: slice(None, -1, None) 
[5; 6, 7, 8] 

>>> X[::2] 

getitem: slice(None, None, 2) 
[5, 7, 9] 


If used, the _setitem_ index assignment method similarly intercepts both index and 
slice assignments—it receives a slice object for the latter, which may be passed along 
in another index assignment in the same way: 


def _setitem_ (self, index, value): # Intercept index or slice assignment 
self.data[index] = value # Assign index or slice 


In fact, _getitem_ may be called automatically in even more contexts than indexing 
and slicing, as the next section explains. 


Slicing and Indexing in Python 2.6 


Prior to Python 3.0, classes could also define __getslice__ and _setslice__ methods 
to intercept slice fetches and assignments specifically; they were passed the bounds of 
the slice expression and were preferred over _getitem and _ setitem_ for slices. 
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These slice-specific methods have been removed in 3.0, so you should use 
getitem and setitem_ instead and allow for both indexes and slice objects as 
arguments. In most classes, this works without any special code, because indexing 
methods can manually pass along the slice object in the square brackets of another 
index expression (as in our example). See the section “Membership: __contains__, 
iter__, and __getitem__” on page 716 for another example of slice interception at 
work. 


Also, don’t confuse the (arguably unfortunately named) __index__ method in Python 
3.0 for index interception; this method returns an integer value for an instance when 
needed and is used by built-ins that convert to digit strings: 


>>> class C: 

def _index_ (self): 

return 255 

>>> X = C() 
>>> hex(X) # Integer value 
'oxff' 
>>> bin(X) 
"0b11111111' 
>>> oct(X) 
'00377' 


Although this method does not intercept instance indexing like __getitem_, it is also 
used in contexts that require an integer—including indexing: 


>>> ('C' * 256)[255] 


c 
>>> ('C' * 256)[X] # As index (not X[i]) 
c 
>>> ('C' * 256)[X:] # As index (not X[i:]) 
c 


This method works the same way in Python 2.6, except that it is not called for the 
hex and oct built-in functions (use _hex_ and _oct__in 2.6 instead to intercept these 
calls). 


Index Iteration: _getitem__ 


Here’s a trick that isn’t always obvious to beginners, but turns out to be surprisingly 
useful. The for statement works by repeatedly indexing a sequence from zero to higher 
indexes, until an out-of-bounds exception is detected. Because of that, _getitem__ also 
turns out to be one way to overload iteration in Python—if this method is defined, 
for loops call the class’s __ getitem__ each time through, with successively higher off- 
sets. It’s a case of “buy one, get one free’—any built-in or user-defined object that 


responds to indexing also responds to iteration: 


>>> class stepper: 
def _ getitem_(self, i): 
return self.data[i] 
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>>> X = stepper() # X is a stepper object 
>>> X.data = "Spam" 


>>> 

>>> X[1] # Indexing calls __getitem__ 

'p' 

>>> for item in X: # for loops call __getitem__ 
print(item, end=' ') # for indexes items 0..N 

Spam 


In fact, it’s really a case of “buy one, get a bunch free.” Any class that supports for loops 
automatically supports all iteration contexts in Python, many of which we’ve seen in 
earlier chapters (iteration contexts were presented in Chapter 14). For example, the 
in membership test, list comprehensions, the map built-in, list and tuple assignments, 
and type constructors will also call _getitem__ automatically, if it’s defined: 


>>> 'p' in X # All call __getitem__ too 
True 
>>> [c for c in X] # List comprehension 


['s', 'p', E 'm'] 


>>> list(map(str.upper, X)) # map calls (use list() in 3.0) 
['S', "Pi, 'A', 'M'] 

>>> (a, b, c, d) = X # Sequence assignments 
>>> a, c, d 


('Ss', malty ‘m') 

>>> list(X), tuple(X), ''.join(X) 

(['S', ‘p's "ay 'm'], ('s', 'p', hay 'm'), ‘Spam’ ) 
>>> X 

<__main__.stepper object at Ox00A8D5D0> 


In practice, this technique can be used to create objects that provide a sequence interface 
and to add logic to built-in sequence type operations; we’ll revisit this idea when ex- 
tending built-in types in Chapter 31. 


Iterator Objects: iter__ and__next__ 


Although the __getitem__ technique of the prior section works, it’s really just a fallback 
for iteration. Today, all iteration contexts in Python will try the _iter_ method first, 
before trying __getitem_. That is, they prefer the iteration protocol we learned about 
in Chapter 14 to repeatedly indexing an object; only if the object does not support the 
iteration protocol is indexing attempted instead. Generally speaking, you should prefer 
__iter__ too—it supports general iteration contexts better than __getitem__ can. 


Technically, iteration contexts work by calling the iter built-in function to try to find 
an _iter_ method, which is expected to return an iterator object. If it’s provided, 
Python then repeatedly calls this iterator object’s __next__ method to produce items 
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until a StopIteration exception is raised. If no such _iter__ method is found, Python 
falls back on the _ getitem__ scheme and repeatedly indexes by offsets as before, until 
an IndexError exception is raised. A next built-in function is also available as a con- 
venience for manual iterations: next(I) is the same as I.__next_ (). 


Va, 
SS Version skew note: As described in Chapter 14, if you are using Python 
as, 2.6, the I. next__() method just described is named I.next() in your 
(eo a “4 — -o é Aji , 
` ak Python, and the next(I) built-in is present for portability: it calls 


I.next() in2.6and1I.__next__() in3.0. Iteration works the same in 2.6 
in all other respects. 


User-Defined Iterators 


In the _ iter scheme, classes implement user-defined iterators by simply imple- 
menting the iteration protocol introduced in Chapters 14 and 20 (refer back to those 
chapters for more background details on iterators). For example, the following file, 
iters.py, defines a user-defined iterator class that generates squares: 
class Squares: 
def init__(self, start, stop): # Save state when created 


self.value = start - 1 
self.stop = stop 


def _ iter (self): # Get iterator object on iter 
return self 

def __next__(self): # Return a square on each iteration 
if self.value == self.stop: # Also called by next built-in 


raise StopIteration 
self.value += 1 
return self.value ** 2 


% python 

>>> from iters import Squares 

>>> for i in Squares(1, 5): # for calls iter, which calls __iter__ 
print(i, end=' ') # Each iteration calls __next__ 

149 16 25 


Here, the iterator object is simply the instance self, because the __next__ method is 
part of this class. In more complex scenarios, the iterator object may be defined as a 
separate class and object with its own state information to support multiple active 
iterations over the same data (we’ll see an example of this in a moment). The end of 
the iteration is signaled with a Python raise statement (more on raising exceptions in 
the next part of this book). Manual iterations work as for built-in types as well: 


>>> X = Squares(1, 5) # Iterate manually: what loops do 
>>> I = iter(X) # iter calls __iter 

>>> next(I) # next calls ___next__ 

1 

>>> next(I) 

4 


712 | Chapter29: Operator Overloading 


...more omitted... 

>>> next(I) 

25 

>>> next(I) # Can catch this in try statement 
StopIteration 


An equivalent coding of this iterator with __getitem_ might be less natural, because 
the for would then iterate through all offsets zero and higher; the offsets passed in 
would be only indirectly related to the range of values produced (0. .N would need to 
map to start..stop). Because _iter__ objects retain explicitly managed state between 
next calls, they can be more general than _ getitem_. 


On the other hand, using iterators based on __iter__ can sometimes be more complex 
and less convenient than using _getitem_. They are really designed for iteration, not 
random indexing—in fact, they don’t overload the indexing expression at all: 

>>> X = Squares(1, 5) 


>>> X[1] 
AttributeError: Squares instance has no attribute 


_ getitem_ 


The _iter__ scheme is also the implementation for all the other iteration contexts we 
saw in action for _getitem__ (membership tests, type constructors, sequence assign- 
ment, and so on). However, unlike our prior _getitem__ example, we also need to be 
aware that a class’s _iter_ may be designed for a single traversal, not many. For 
example, the Squares class is a one-shot iteration; once you’ve iterated over an instance 
of that class, it’s empty. You need to make a new iterator object for each new iteration: 


>>> X = Squares(1, 5) 


>>> [n for n in X] # Exhausts items 

[1, 4, 9, 16, 25] 

>>> [n for n in X] # Now it's empty 

[] 

>>> [n for n in Squares(1, 5)] # Make a new iterator object 


[1, 4, 9, 16, 25] 
>>> list(Squares(1, 3)) 
[1, 4, 9] 


Notice that this example would probably be simpler if it were coded with generator 
functions (topics or expressions introduced in Chapter 20 and related to iterators): 
>>> def gsquares(start, stop): 


for i in range(start, stop+1): 
yield i ** 2 


>>> for i in gsquares(1, 5): # or: (x * 2 for x in range(1, 5)) 
print(i, end=' ') 


149 16 25 


Unlike the class, the function automatically saves its state between iterations. Of course, 
for this artificial example, you could in fact skip both techniques and simply use a 
for loop, map, or a list comprehension to build the list all at once. The best and fastest 
way to accomplish a task in Python is often also the simplest: 
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>>> [x ** 2 for x in range(1, 6)] 
[1, 4, 9, 16, 25] 


However, classes may be better at modeling more complex iterations, especially when 
they can benefit from state information and inheritance hierarchies. The next section 
explores one such use case. 


Multiple Iterators on One Object 


Earlier, I mentioned that the iterator object may be defined as a separate class with its 
own state information to support multiple active iterations over the same data. Con- 
sider what happens when we step across a built-in type like a string: 
>>> S = ‘ace’ 
>>> for x in S: 
for y in S: 
print(x + y, end=' ') 


aa aC ae Ca CC Ce ea ec ee 


Here, the outer loop grabs an iterator from the string by calling iter, and each nested 
loop does the same to get an independent iterator. Because each active iterator has its 
own state information, each loop can maintain its own position in the string, regardless 
of any other active loops. 


We saw related examples earlier, in Chapters 14 and 20. For instance, generator func- 
tions and expressions, as well as built-ins like map and zip, proved to be single-iterator 
objects; by contrast, the range built-in and other built-in types, like lists, support mul- 
tiple active iterators with independent positions. 


When we code user-defined iterators with classes, it’s up to us to decide whether we 
will support a single active iteration or many. To achieve the multiple-iterator effect, 
__iter__ simply needs to define a new stateful object for the iterator, instead of re- 
turning self. 


The following, for example, defines an iterator class that skips every other item on 
iterations. Because the iterator object is created anew for each iteration, it supports 
multiple active loops: 


class SkipIterator: 
def _ init__(self, wrapped): 
self.wrapped = wrapped # Iterator state information 
self.offset = 0 
def _next_ (self): 


if self.offset >= len(self.wrapped): # Terminate iterations 
raise StopIteration 

else: 
item = self.wrapped[self.offset] # else return and skip 


self.offset += 2 
return item 


class SkipObject: 
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def init__(self, wrapped): # Save item to be used 
self.wrapped = wrapped 
def _iter_ (self): 


return SkipIterator(self.wrapped) # New iterator each time 
if _name_ == '' main_': 

alpha = '‘abcdef' 
skipper = SkipObject(alpha) # Make container object 
I = iter(skipper) # Make an iterator on it 
print(next(I), next(I), next(I)) # Visit offsets 0, 2, 4 
for x in skipper: # for calls __iter__ automatically 

for y in skipper: # Nested fors call __iter__ again each time 


print(x + y, end=' ') # Each iterator has its own state, offset 


When run, this example works like the nested loops with built-in strings. Each active 
loop has its own position in the string because each obtains an independent iterator 
object that records its own state information: 

% python skipper.py 


ace 
aa ac a€ Ca CC Ce ea ec ee 


By contrast, our earlier Squares example supports just one active iteration, unless we 
call Squares again in nested loops to obtain new objects. Here, there is just one 
SkipObject, with multiple iterator objects created from it. 


As before, we could achieve similar results with built-in tools—for example, slicing 
with a third bound to skip items: 
>>> S = 'abcdef' 
>>> for x in S[::2]: 
for y in S[::2]: # New objects on each iteration 
print(x + y, end=' ') 


aa ac dae Ca CC Ce ea ec ee 


This isn’t quite the same, though, for two reasons. First, each slice expression here will 
physically store the result list all at once in memory; iterators, on the other hand, pro- 
duce just one value at a time, which can save substantial space for large result lists. 
Second, slices produce new objects, so we’re not really iterating over the same object 
in multiple places here. To be closer to the class, we would need to make a single object 
to step across by slicing ahead of time: 


>>> S = 'abcdef' 


>>> S = S[::2] 
>> S 
'ace' 
>>> for x in S: 
for y in S: # Same object, new iterators 


print(x + y, end=' ') 


aa ac ae Ca CC ce ea ec ee 
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This is more similar to our class-based solution, but it still stores the slice result in 
memory all at once (there is no generator form of built-in slicing today), and it’s only 
equivalent for this particular case of skipping every other item. 


Because iterators can do anything a class can do, they are much more general than this 
example may imply. Regardless of whether our applications require such generality, 
user-defined iterators are a powerful tool—they allow us to make arbitrary objects look 
and feel like the other sequences and iterables we have met in this book. We could use 
this technique with a database object, for example, to support iterations over database 
fetches, with multiple cursors into the same query result. 


Membership: _contains__,__iter__,and___getitem__ 


The iteration story is even richer than we’ve seen thus far. Operator overloading is often 
layered: classes may provide specific methods, or more general alternatives used as 
fallback options. For example: 


e Comparisons in Python 2.6 use specific methods such as __1t__ for less than if 
present, or else the general _cmp__. Python 3.0 uses only specific methods, not 
__cmp__, as discussed later in this chapter. 


e Boolean tests similarly try a specific _bool__ first (to give an explicit True/False 
result), and if it’s absent fall back on the more general _len__ (a nonzero length 
means True). As we’ll also see later in this chapter, Python 2.6 works the same but 
uses the name _nonzero__instead of _bool_. 


In the iterations domain, classes normally implement the in membership operator as 
an iteration, using either the _iter__ method or the _ getitem_ method. To support 
more specific membership, though, classes may code a___contains__ method—when 
present, this method is preferred over _iter_, which is preferred over _ getitem_. 
The _ contains method should define membership as applying to keys for a map- 
ping (and can use quick lookups), and as a search for sequences. 


Consider the following class, which codes all three methods and tests membership and 


various iteration contexts applied to aninstance. Its methods print trace messages when 
called: 


class Iters: 
def _ init__(self, value): 
self.data = value 


def _ getitem_(self, i): # Fallback for iteration 
print('get[%s]:' % i, end='') # Also for index, slice 
return self.data[i] 

def _iter_ (self): # Preferred for iteration 
print('iter=> ', end='') # Allows only 1 active iterator 
self.ix = 0 


return self 
def __next__(self): 
print('next:', end='') 
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if self.ix == len(self.data): raise StopIteration 
item = self.data[self.ix] 
self.ix += 1 
return item 

def contains (self, x): # Preferred for 'in' 
print('contains: ', end='') 
return x in self.data 


X = Iters([1, 2, 3, 4, 5]) # Make instance 
print(3 in X) # Membership 
for iin X: # For loops 
print(i, end=' | ') 
print() 
print([i ** 2 for i in X]) # Other iteration contexts 


print( list(map(bin, X)) ) 


I = iter(X) # Manual iteration (what other contexts do) 
while True: 
try: 


print(next(I), end=' @ ') 
except StopIteration: 
break 


When runasitis, this script’s output is as follows—the specific contains intercepts 
membership, the general _iter__ catches other iteration contexts such that __next__ 
is called repeatedly, and __ getitem__ is never called: 

contains: True 

iter=> next:1 | next:2 | next:3 | next:4 | next:5 | next: 

iter=> next:next:next:next:next:next:[1, 4, 9, 16, 25] 

iter=> next:next:next:next:next:next:['Ob1', '0b10', '0b11', '0b100', '0b101'] 

iter=> next:1 @ next:2 @ next:3 @ next:4 @ next:5 @ next: 


Watch what happens to this code’s output if we comment out its contains _ method, 
though—membership is now routed to the general __iter__ instead: 

iter=> next:next:next: True 

iter=> next:1 | next:2 | next:3 | next:4 | next:5 | next: 

iter=> next:next:next:next:next:next:[1, 4, 9, 16, 25] 

iter=> next:next:next:next:next:next:['Ob1', '0b10', '0b11', '0b100', '0b101'] 

iter=> next:1 @ next:2 @ next:3 @ next:4 @ next:5 @ next: 


And finally, here is the output if both __contains_ and _iter__ are commented out— 
the indexing __ getitem_ fallback is called with successively higher indexes for mem- 
bership and other iteration contexts: 


get[0]:get[1]:get[2]:True 


get[0]:1 | get[1]:2 | get[2]:3 | get[3]:4 | get[4]:5 | get[5]: 
get[0]:get[1]:get[2]:get[3]:get[4]:get[5]:[1, 4, 9, 16, 25] 
get[0]:get[1]:get[2]:get[3]:get[4]:get[5]:['0b1', 'ob10', '0b11', '0b100','0b101' ] 
get[0]:1 @ get[1]:2 @ get[2]:3 @ get[3]:4 @ get[4]:5 @ get[5]: 
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As we’ve seen, the _getitem__ method is even more general: besides iterations, it also 
intercepts explicit indexing as well as slicing. Slice expressions trigger _getitem__ with 
a slice object containing bounds, both for built-in types and user-defined classes, so 
slicing is automatic in our class: 


>>> X = Iters('spam') # Indexing 

>>> X[0] # __getitem__(0) 
get[o]:'s' 

>>> 'spam'[1:] # Slice syntax 

"pam 

>>> 'spam'[slice(1, None) ] # Slice object 

‘pam 

>>> X[1:] # __getitem__(slice(..)) 
get[slice(1, None, None)]:'pam' 

>>> X[:-1] 


get[slice(None, -1, None)]:'spa' 


In more realistic iteration use cases that are not sequence-oriented, though, the 
__iter_ method may be easier to write since it must not manage an integer index, and 
__contains__ allows for membership optimization as a special case. 


Attribute Reference: __getattr__and __setattr 


The _ getattr__ method intercepts attribute qualifications. More specifically, it’s 
called with the attribute name as a string whenever you try to qualify an instance with 
an undefined (nonexistent) attribute name. It is not called if Python can find the attribute 
using its inheritance tree search procedure. Because of its behavior, _getattr___ is use- 
ful as a hook for responding to attribute requests in a generic fashion. For example: 
>>> class empty: 
def _ getattr_(self, attrname): 
if attrname == "age": 
return 40 


else: 
raise AttributeError, attrname 


>>> X = empty() 

>>> X.age 

40 

>>> X.name 

...error text omitted... 
AttributeError: name 


Here, the empty class and its instance X have no real attributes of their own, so the access 
to X.age gets routed to the _ getattr__ method; self is assigned the instance (X), and 
attrname is assigned the undefined attribute name string ("age"). The class makes age 
look like a real attribute by returning a real value as the result of the X.age qualification 
expression (40). In effect, age becomes a dynamically computed attribute. 
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For attributes that the class doesn’t know how to handle, _ getattr__ raises the built- 
in AttributeError exception to tell Python that these are bona fide undefined names; 
asking for X.name triggers the error. You’llsee__getattr__ again when we see delegation 
and properties at work in the next two chapters, and I'll say more about exceptions in 
Part VII. 


A related overloading method, _ setattr_, intercepts all attribute assignments. If this 
method is defined, self.attr = value becomes self.__setattr__(‘attr', value). This 
is a bit trickier to use because assigning to any self attributes within __setattr__ calls 
__setattr__ again, causing an infinite recursion loop (and eventually, a stack overflow 
exception!). If you want to use this method, be sure that it assigns any instance at- 
tributes by indexing the attribute dictionary, discussed in the next section. That is, use 
self. dict__['name'] = x, not self.name = x: 
>>> class accesscontrol: 
def _setattr_(self, attr, value): 
if attr == 'age': 
self. dict__[attr] = value 


else: 
raise AttributeError, attr + 


not allowed’ 


>>> X = accesscontrol() 

>>> X.age = 40 # Calls __setattr__ 
>>> X.age 

40 

>>> X.name = 'mel' 

...text omitted... 

AttributeError: name not allowed 


These two attribute-access overloading methods allow you to control or specialize ac- 
cess to attributes in your objects. They tend to play highly specialized roles, some of 
which we’ll explore later in this book. 


Other Attribute Management Tools 


For future reference, also note that there are other ways to manage attribute access in 
Python: 


e The _getattribute_ method intercepts all attribute fetches, not just those that 
are undefined, but when using it you must be more cautious than with 
__getattr__ to avoid loops. 


e The property built-in function allows us to associate methods with fetch and set 
operations on a specific class attribute. 


e Descriptors provide a protocol for associating _get__ and __ set__ methods of a 
class with accesses to a specific class attribute. 


Because these are somewhat advanced tools not of interest to every Python program- 
mer, we'll defer a look at properties until Chapter 31 and detailed coverage of all the 
attribute management techniques until Chapter 37. 
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Emulating Privacy for Instance Attributes: Part 1 


The following code generalizes the previous example, to allow each subclass to have 
its own list of private names that cannot be assigned to its instances: 


class PrivateExc(Exception): pass # More on exceptions later 


class Privacy: 
def _setattr_(self, attrname, value): # On self.attrname = value 
if attrname in self.privates: 
raise PrivateExc(attrname, self) 
else: 
self. dict__[attrname] = value # self.attrname = value loops! 


class Test1(Privacy): 
privates = ['age'] 


class Test2(Privacy): 
privates = ['name', 'pay'] 
def _ init__(self): 


self. dict __['name'] = 'Tom' 
x = Test1() 
y = Test2() 
x.name = 'Bob' 
y.name = 'Sue' # Fails 
y.age = 30 
x.age = 40 # Fails 


In fact, this is a first-cut solution for an implementation of attribute privacy in Python 
(i.e., disallowing changes to attribute names outside a class). Although Python doesn’t 
support private declarations per se, techniques like this can emulate much of their 
purpose. This is a partial solution, though; to make it more effective, it must be aug- 
mented to allow subclasses to set private attributes more naturally, too, and to use 
__getattr__ and a wrapper (sometimes called a proxy) class to check for private at- 
tribute fetches. 


We'll postpone a more complete solution to attribute privacy until Chapter 38, where 
we'll use class decorators to intercept and validate attributes more generally. Even 
though privacy can be emulated this way, though, it almost never is in practice. Python 
programmers are able to write large OOP frameworks and applications without private 
declarations—an interesting finding about access controls in general that is beyond the 
scope of our purposes here. 


Catching attribute references and assignments is generally a useful technique; it sup- 
ports delegation, a design technique that allows controller objects to wrap up embedded 
objects, add new behaviors, and route other operations back to the wrapped objects 
(more on delegation and wrapper classes in Chapter 30). 
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String Representation: repr__ and__str__ 


The next example exercises the _init__constructorandthe _add__ overload method, 
both of which we’ve already seen, as well as defining a _repr__ method that returns a 
string representation for instances. String formatting is used to convert the managed 
self.data object to a string. If defined, _repr__ (or its sibling, _str_) is called auto- 
matically when class instances are printed or converted to strings. These methods allow 
you to define a better display format for your objects than the default instance display. 


The default display of instance objects is neither useful nor pretty: 


>>> class adder: 
def _ init__(self, value=o): 


self.data = value # Initialize data 
def __add_(self, other): 
self.data += other # Add other in-place (bad!) 
>>> x = adder() # Default displays 


>>> print(x) 

<__main__.adder object at 0x025D66Bo> 
>>> X 

<__main__.adder object at 0x025D66Bo> 


But coding or inheriting string representation methods allows us to customize the 
display: 


>>> class addrepr(adder): # Inherit __init__, __add__ 

def _ repr__(self): # Add string representation 
return ‘addrepr(%s)' % self.data # Convert to as-code string 

>>> x = addrepr(2) # Runs __init__ 

>>> x1 # Runs __add__ 

>>> x # Runs __repr__ 

addrepr (3) 

>>> print(x) # Runs __repr__ 

addrepr (3) 

>>> str(x), repr(x) # Runs __repr__for both 


(‘addrepr(3)', ‘addrepr(3)') 
So why two display methods? Mostly, to support different audiences. In full detail: 


e _str_ is tried first for the print operation and the str built-in function (the in- 
ternal equivalent of which print runs). It generally should return a user-friendly 
display. 

e __repr_ is used in all other contexts: for interactive echoes, the repr function, and 
nested appearances, as well as by print and str if no _str__is present. It should 
generally return an as-code string that could be used to re-create the object, or a 
detailed display for developers. 


In a nutshell, repr__ is used everywhere, except by print and str whena _ str_is 
defined. Note, however, that while printing falls back on _repr__if no _str__ is 
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defined, the inverse is not true—other contexts, such as interactive echoes, use 
__repr__ only and don’t try _str__at all: 
>>> class addstr(adder): 


def _str_ (self): # __str__ but no _ repr. 
return '[Value: %s]' % self.data # Convert to nice string 


>>> x = addstr(3) 


>>> x1 

>>> X # Default __repr__ 
<__main__.addstr object at 0x00B35EFO> 

>>> print(x) # Runs __str__ 
[Value: 4] 


>>> str(x), repr(x) 
(‘[Value: 4]', '<__main_.addstr object at 0x00B35EFO>') 


Because of this, _repr__ may be best if you want a single display for all contexts. By 
defining both methods, though, you can support different displays in different 
contexts—for example, an end-user display with _ str_, and a low-level display for 
programmers to use during development with __repr__. In effect, _str__ simply over- 
rides _repr__ for user-friendly display contexts: 
>>> class addboth(adder): 
def _str_ (self): 
return '[Value: %s]' % self.data # User-friendly string 
def _repr_ (self): 
return ‘addboth(%s)' % self.data # As-code string 


>>> x = addboth(4) 


>>> x1 

>>> x # Runs __repr__ 
addboth(5) 

>>> print(x) # Runs __str__ 
[Value: 5] 


>>> str(x), repr(x) 

('[Value: 5]', ‘addboth(5)') 
I should mention two usage notes here. First, keep in mind that _ str and 
_repr_ must both return strings; other result types are not converted and raise errors, 
so be sure to run them through a converter if needed. Second, depending on a con- 
tainer’s string-conversion logic, the user-friendly display of __str__ might only apply 
when objects appear at the top level of a print operation; objects nested in larger objects 
might still print with their__repr__ or its default. The following illustrates both of these 
points: 

>>> class Printer: 

def init__(self, val): 
self.val = val 


def _str_ (self): # Used for instance itself 
return str(self.val) # Convert to a string result 


>>> objs = [Printer(2), Printer(3)] 
>>> for x in objs: print(x) # __str__ run when instance printed 


# But not when instance in a list! 
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2 

3 

>>> print(objs) 

[<__main_.Printer object at 0x025D06FO>, <_main_.Printer object at ...more... 
>>> objs 

[<_main_.Printer object at 0x025D06F0>, <_main_.Printer object at ...more... 


To ensure that a custom display is run in all contexts regardless of the container, code 
__vepr_,not_str_; the former is run in all cases if the latter doesn’t apply: 


>>> class Printer: 
def init__(self, val): 
self.val = val 
def _repr_ (self): # _repr_ used by print if no __str__ 
return str(self.val) # _repr_ used if echoed or nested 


>>> objs = [Printer(2), Printer(3)] 
>>> for x in objs: print(x) # No __str__: runs _ repr. 


2 

3 

>>> print(objs) # Runs __repr__, not ___str 
[2, 3] 

>>> objs 

[2, 3] 


In practice, _str__ (or its low-level relative, repr __) seems to be the second most 
commonly used operator overloading method in Python scripts, behind _init__. Any 
time you can print an object and see a custom display, one of these two tools is probably 
in use. 


Right-Side and In-Place Addition: radd__ and __iadd__ 


Technically, the _add__ method that appeared in the prior example does not support 
the use of instance objects on the right side of the + operator. To implement such 
expressions, and hence support commutative-style operators, code the _ radd__ 
method as well. Python calls __radd__ only when the object on the right side of the + is 
your class instance, but the object on the left is not an instance of your class. The 
__add_ method for the object on the left is called instead in all other cases: 


>>> class Commuter: 
def init__(self, val): 
self.val = val 
def _add_(self, other): 
print('add', self.val, other) 
return self.val + other 
def _radd__ (self, other): 
print('radd', self.val, other) 
return other + self.val 
>>> x 
>>> y 


Commuter (88) 
Commuter (99) 
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# 


__add 


: instance + noninstance 


# __radd__: noninstance + instance 


# 


__add__: instance + instance, triggers __radd__ 


add 88 <__main__.Commuter object at 0x02630910> 


»> x1 
add 88 1 
89 

>>> 1+ty 
radd 99 1 
100 

>>> x+y 
radd 99 88 
187 


Notice how the order is reversed in _radd_: self is really on the right of the +, and 
other is on the left. Also note that x and y are instances of the same class here; when 
instances of different classes appear mixed in an expression, Python prefers the class 
of the one on the left. When we add the two instances together, Python runs _add__ 


which in turn triggers __radd__ by simplifying the left operand. 


In more realistic classes where the class type may need to be propagated in results, 
things can become trickier: type testing may be required to tell whether it’s safe to 
convert and thus avoid nesting. For instance, without the isinstance test in the fol- 
lowing, we could wind up with a Commuter whose val is another Commuter when two 


instances are added and _add__ triggers _radd__: 


>>> 


>>> 
>>> 
>>> 
<Col 
>>> 
<Col 


>>> 
>>> 
<Col 
>>> 
<Col 
>>> 
<Col 


class Commuter: 


def init__(self, val): 


self.val = val 


def _add_ (self, other): 
if isinstance(other, Commuter): other = 


# Propagate class type in results 


other.val 


return Commuter (self.val + other) 


def _radd_(self, other): 


return Commuter(other + self.val) 


def _str_ (self): 


return '<Commuter: %s>' % self.val 


x = Commuter(88) 
y = Commuter (99) 
print(x + 10) 
muter: 98> 
print(10 + y) 
muter: 109> 


z=xty 
print(z) 
muter: 187> 
print(z + 10) 
muter: 197> 
print(z + z) 
muter: 374> 


# Result is another Commuter instance 


# Not nested: doesn't recur to ___radd__ 
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In-Place Addition 


To also implement += in-place augmented addition, code either an _iadd__ or an 
__add_. The latter is used if the former is absent. In fact, the prior section’s Commuter 
class supports += already for this reason, but__iadd__ allows for more efficient in-place 
changes: 
>>> class Number: 
def init__(self, val): 
self.val = val 
def _iadd_ (self, other): # _iadd__ explicit: x += y 
self.val += other # Usually returns self 
return self 


>>> x = Number(5) 
>>> x+1 
>>> x+1 
>>> x.val 


>>> class Number: 
def init__(self, val): 
self.val = val 
def _add__(self, other): # __add_ fallback: x = (x + y) 
return Number(self.val + other) # Propagates class type 


>>> x = Number(5) 
>>> x+1 
>>> x+1 
>>> x.val 


Every binary operator has similar right-side and in-place overloading methods that 
work the same (e.g., _mul__, _rmul__, and __imul_). Right-side methods are an ad- 
vanced topic and tend to be fairly rarely used in practice; you only code them when 
you need operators to be commutative, and then only if you need to support such 
operators at all. For instance, a Vector class may use these tools, but an Employee or 
Button class probably would not. 


Call Expressions: call___ 


The _call_ method is called when your instance is called. No, this isn’t a circular 
definition—if defined, Python runs a _call__ method for function call expressions 
applied to your instances, passing along whatever positional or keyword arguments 
were sent: 

>>> class Callee: 


def _call_(self, *pargs, **kargs): # Intercept instance calls 
print('Called:', pargs, kargs) # Accept arbitrary arguments 


>>> C = Callee() 
>>> C(1, 2, 3) # Cis a callable object 
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Called: (1, 2, 3) {} 
>>> C(1, 2, 3, x=4, y=5) 
Called: (1, 2, 3) {'y': 5, 'x': 4} 


More formally, all the argument-passing modes we explored in Chapter 18 are sup- 
ported by the _call__ method—whatever is passed to the instance is passed to this 


method, along with the usual implied instance argument. For example, the method 
definitions: 


class C: 

def _call_ (self, a, b, c=5, d=6): ... # Normals and defaults 
class C: 

def _call_ (self, *pargs, **kargs): ... # Collect arbitrary arguments 
class C: 

def _call_ (self, *pargs, d=6, **kargs): ... # 3.0 keyword-only argument 


all match all the following instance calls: 


X = C() 

X(1, 2) # Omit defaults 

X(1, 2, 3, 4) # Positionals 

X(a=1, b=2, d=4) # Keywords 

X(*[1, 2], **dict(c=3, d=4)) # Unpack arbitrary arguments 
X(1, *(2,), c=3, **dict(d=4)) # Mixed modes 


The net effect is that classes and instances with a __call__ support the exact same 
argument syntax and semantics as normal functions and methods. 


Intercepting call expression like this allows class instances to emulate the look and feel 
of things like functions, but also retain state information for use during calls (we saw 
a similar example while exploring scopes in Chapter 17, but you should be more fa- 
miliar with operator overloading here): 


>>> class Prod: 
def init__(self, value): # Accept just one argument 
self.value = value 
def _call__ (self, other): 
return self.value * other 


>>> x = Prod(2) # "Remembers" 2 in state 
>>> x(3) # 3 (passed) * 2 (state) 

6 

>>> x(4) 

8 


In this example, the __call__ may seem a bit gratuitous at first glance. A simple method 
can provide similar utility: 
>>> class Prod: 
def init__(self, value): 
self.value = value 


def comp(self, other): 
return self.value * other 
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>>> x = Prod(3) 
>>> x.comp(3) 
9 

>>> x.comp(4) 
12 


However, _call__ can become more useful when interfacing with APIs that expect 
functions—it allows us to code objects that conform to an expected function call in- 
terface, but also retain state information. In fact, it’s probably the third most commonly 
used operator overloading method, behind the __init__ constructor and the _ str __ 
and _repr__ display-format alternatives. 


Function Interfaces and Callback-Based Code 


As an example, the tkinter GUI toolkit (named Tkinter in Python 2.6) allows you to 
register functions as event handlers (a.k.a. callbacks); when events occur, tkinter calls 
the registered objects. If you want an event handler to retain state between events, you 
can register either a class’s bound method or an instance that conforms to the expected 
interface with __call_. In this section’s code, both x.comp from the second example 
and x from the first can pass as function-like objects this way. 


Pll have more to say about bound methods in the next chapter, but for now, here’s a 
hypothetical example of __call__ applied to the GUI domain. The following class de- 
fines an object that supports a function-call interface, but also has state information 
that remembers the color a button should change to when it is later pressed: 


class Callback: 


def _ init__(self, color): # Function + state information 
self.color = color 
def _call_ (self): # Support calls with no arguments 


print('turn', self.color) 


Now, in the context of a GUI, we can register instances of this class as event handlers 
for buttons, even though the GUI expects to be able to invoke event handlers as simple 
functions with no arguments: 


cb1 = Callback('blue') # Remember blue 
cb2 = Callback('green') 


B1 = Button(command=cb1) # Register handlers 
B2 = Button(command=cb2) # Register handlers 


When the button is later pressed, the instance object is called as a simple function, 
exactly like in the following calls. Because it retains state as instance attributes, though, 
it remembers what to do: 


cb1() # On events: prints 'blue' 
cb2() # Prints 'green' 


In fact, this is probably the best way to retain state information in the Python 
language—better than the techniques discussed earlier for functions (global variables, 
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enclosing function scope references, and default mutable arguments). With OOP, the 
state remembered is made explicit with attribute assignments. 


Before we move on, there are two other ways that Python programmers sometimes tie 
information to a callback function like this. One option is to use default arguments in 
lambda functions: 


cb3 = (lambda color='red': ‘turn ' + color) # Or: defaults 
print (cb3()) 


The other is to use bound methods of a class. A bound method object is a kind of object 
that remembers the self instance and the referenced function. A bound method may 
therefore be called as a simple function without an instance later: 


class Callback: 


def _ init__(self, color): # Class with state information 
self.color = color 
def changeColor(self): # A normal named method 


print('turn', self.color) 


cb1 = Callback('blue') 


cb2 = Callback('yellow' ) 
B1 = Button(command=cb1.changeColor) # Reference, but don't call 
B2 = Button(command=cb2.changeColor) # Remembers function+self 


In this case, when this button is later pressed it’s as if the GUI does this, which invokes 
the changeColor method to process the object’s state information: 
object = Callback('blue') 


cb = object.changeColor # Registered event handler 
cb() # On event prints 'blue' 


This technique is simpler, but less general than overloading calls with __call__; again, 
watch for more about bound methods in the next chapter. 


You'll also see another __call__ example in Chapter 31, where we will use it to imple- 
ment something known as a function decorator—a callable object often used to add a 
layer of logic on top of an embedded function. Because __call__ allows us to attach 
state information to a callable object, it’s a natural implementation technique for a 
function that must remember and call another function. 


Comparisons: _It__, _ gt__, and Others 


As suggested in Table 29-1, classes can define methods to catch all six comparison 
operators: <, >, <=, >=, ==, and !=. These methods are generally straightforward to use, 
but keep the following qualifications in mind: 
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e Unlike the _add_/ _radd__ pairings discussed earlier, there are no right-side 
variants of comparison methods. Instead, reflective methods are used when only 
one operand supports comparison (e.g., _lt_ and _ gt__ are each other’s 
reflection). 


e There are no implicit relationships among the comparison operators. The truth of 
== does not imply that != is false, for example, so both _eq__ and _ne__ should 
be defined to ensure that both operators behave correctly. 


e In Python 2.6, a _cmp_ method is used by all comparisons if no more specific 
comparison methods are defined; it returns a number that is less than, equal to, or 
greater than zero, to signal less than, equal, and greater than results for the com- 
parison of its two arguments (self and another operand). This method often uses 
the cmp(x, y) built-in to compute its result. Both the __cmp__ method and the 
cmp built-in function are removed in Python 3.0: use the more specific methods 
instead. 


We don’t have space for an in-depth exploration of comparison methods, but as a quick 
introduction, consider the following class and test code: 


class C: 
data = 'spam' 
def _gt_ (self, other): # 3.0 and 2.6 version 


return self.data > other 
def _1t__ (self, other): 
return self.data < other 


X = C() 
print(X > ‘ham') # True (runs __gt_) 
print(X < ‘ham') # False (runs __It__) 


When run under Python 3.0 or 2.6, the prints at the end display the expected results 
noted in their comments, because the class’s methods intercept and implement com- 
parison expressions. 


The 2.6 __cmp__ Method (Removed in 3.0) 


In Python 2.6, the _cmp_ method is used as a fallback if more specific methods are 
not defined: its integer result is used to evaluate the operator being run. The following 
produces the same result under 2.6, for example, but fails in 3.0 because __cmp__ is no 
longer used: 


class C: 
data = 'spam' # 2.6 only 
def _cmp_ (self, other): # __cmp__not used in 3.0 
return cmp(self.data, other) # cmp not defined in 3.0 
X = C() 
print(X > ‘ham') # True (runs __cmp_) 
print(X < 'ham') # False (runs __cmp__) 


Comparisons: _lt__, _gt__,andOthers | 729 


Notice that this fails in 3.0 because __cmp _ is no longer special, not because the cmp 
built-in function is no longer present. If we change the prior class to the following to 
try to simulate the cmp call, the code still works in 2.6 but fails in 3.0: 


class C: 
data = 'spam' 
def _cmp_ (self, other): 
return (self.data > other) - (self.data < other) 


So why, you might be asking, did I just show you a comparison method 
that is no longer supported in 3.0? While it would be easier to erase 

history entirely, this book is designed to support both 2.6 and 3.0 read- 
` ers. Because __cmp__ may appear in code 2.6 readers must reuse or 
maintain, it’s fair game in this book. Moreover, __cmp__ was removed 
more abruptly than the _ getslice__ method described earlier, and so 
may endure longer. If you use 3.0, though, or care about running your 
code under 3.0 in the future, don’t use __cmp__ anymore: use the more 
specific comparison methods instead. 


Boolean Tests: bool and len __ 


As mentioned earlier, classes may also define methods that give the Boolean nature of 
their instances—in Boolean contexts, Python first tries __bool__ to obtain a direct 
Boolean value and then, if that’s missing, tries __len__ to determine a truth value from 
the object length. The first of these generally uses object state or other information to 
produce a Boolean result: 


>>> class Truth: 
def _ bool (self): return True 


>>> X = Truth() 
>>> if X: print('yes!') 


yes! 
>>> class Truth: 
def _bool_ (self): return False 


>>> X = Truth() 
>>> bool(X) 
False 


If this method is missing, Python falls back on length because a nonempty object is 
considered true (i.e., a nonzero length is taken to mean the object is true, and a zero 
length means it is false): 


>>> class Truth: 
def _len_(self): return 0 


>>> X = Truth() 
>>> if not X: print('no!') 
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no! 
If both methods are present Python prefers _bool__ over __len__, because it is more 
specific: 


>>> class Truth: 
def _bool__ (self): return True # 3.0 tries __bool__first 
def _len_ (self): return 0 # 2.6 tries __len__first 


>>> X = Truth() 

>>> if X: print(‘yes!') 

yes! 
If neither truth method is defined, the object is vacuously considered true (which has 
potential implications for metaphysically inclined readers!): 


>>> class Truth: 
pass 


>>> X = Truth() 
>>> bool(X) 
True 


And now that we’ve managed to cross over into the realm of philosophy, let’s move on 
to look at one last overloading context: object demise. 


Booleans in Python 2.6 


Python 2.6 users should use __nonzero__ instead of _ bool __ in all of the code in the 
section “Boolean Tests: __bool__ and __len__” on page 730. Python 3.0 renamed the 
2.6 _nonzero__ method to _ bool _, but Boolean tests work the same otherwise (both 
3.0 and 2.6 use__len__ asa fallback). 


If you don’t use the 2.6 name, the very first test in this section will work the same for 
you anyhow, but only because bool ___ is not recognized as a special method name in 
2.6, and objects are considered true by default! 


To witness this version difference live, you need to return False: 


C:\misc> c:\python30\python 
>>> class C: 
def bool (self): 

print('in bool') 
return False 

>>> X = C() 

>>> bool(X) 

in bool 

False 

>>> if X: print(99) 


in bool 
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This works as advertised in 3.0. In 2.6, though, _ bool __ is ignored and the object is 
always considered true: 


C:\misc> c:\python26\python 
>>> class C: 
def bool (self): 
print('in bool') 
return False 


>>> X = C() 
>>> bool(X) 
True 


>>> if X: print(99) 
99 


In2.6,use__nonzero__ for Boolean values (or return 0 fromthe len __ fallback method 
to designate false): 


C:\misc> c:\python26\python 
>>> class C: 
def nonzero (self): 

print('in nonzero’ ) 
return False 

>>> X = C() 

>>> bool(X) 

in nonzero 

False 

>>> if X: print(99) 


in nonzero 


But keep in mind that _nonzero_ works in 2.6 only; if used in 3.0 it will be silently 
ignored and the object will be classified as true by default—just like using __bool__ in 
2.6! 


Object Destruction: _del__ 


We've seen how the _init__ constructor is called whenever an instance is generated. 
Its counterpart, the destructor method __del__, is run automatically when an instance’s 


space is being reclaimed (i.e., at “garbage collection” time): 
p 8 8 8 


>>> class Life: 
def __init__(self, name='unknown' ): 
print('Hello', name) 
self.name = name 
def _del_ (self): 
print('Goodbye', self.name) 


>>> brian = Life('Brian') 
Hello Brian 

>>> brian = 'loretta' 
Goodbye Brian 
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Here, when brian is assigned a string, we lose the last reference to the Life instance 
and so trigger its destructor method. This works, and it may be useful for implementing 
some cleanup activities (such as terminating server connections). However, destructors 
are not as commonly used in Python as in some OOP languages, for a number of 
reasons. 


For one thing, because Python automatically reclaims all space held by an instance 
when the instance is reclaimed, destructors are not necessary for space management.” 
For another, because you cannot always easily predict when an instance will be 
reclaimed, it’s often better to code termination activities in an explicitly called method 
(or try/finally statement, described in the next part of the book); in some cases, there 
may be lingering references to your objects in system tables that prevent destructors 
from running. 


Va 

] In fact, _del__ can be tricky to use for even more subtle reasons. Ex- 
ceptions raised within it, for example, simply print a warning message 
to sys.stderr (the standard error stream) rather than triggering an ex- 
ception event, because of the unpredictable context under which it is 
run by the garbage collector. In addition, cyclic (a.k.a. circular) refer- 
ences among objects may prevent garbage collection from happening 
when you expect it to; an optional cycle detector, enabled by default, 
can automatically collect such objects eventually, but only if they do not 
have _del__ methods. Since this is relatively obscure, we’ll ignore fur- 
ther details here; see Python’s standard manuals’ coverage of both 
__del__ and the gc garbage collector module for more information. 


Chapter Summary 


That’s as many overloading examples as we have space for here. Most of the other 
operator overloading methods work similarly to the ones we’ve explored, and all are 
just hooks for intercepting built-in type operations; some overloading methods, for 
example, have unique argument lists or return values. We’ll see a few others in action 
later in the book: 


e Chapter 33 uses the _enter_ and _exit__ with statement context manager 
methods. 

e Chapter 37 uses the get __ and _ set__ class descriptor fetch/set methods. 

e Chapter 39 uses the _new__ object creation method in the context of metaclasses. 


* In the current C implementation of Python, you also don’t need to close file objects held by the instance in 
destructors because they are automatically closed when reclaimed. However, as mentioned in Chapter 9, it’s 
better to explicitly call file close methods because auto-close-on-reclaim is a feature of the implementation, 
not of the language itself (this behavior can vary under Jython, for instance). 
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In addition, some of the methods we’ve studied here, such as _call__ and _ str_, 
will be employed by later examples in this book. For complete coverage, though, Pll 
defer to other documentation sources—see Python’s standard language manual or ref- 
erence books for details on additional overloading methods. 


In the next chapter, we leave the realm of class mechanics behind to explore common 
design patterns—the ways that classes are commonly used and combined to optimize 
code reuse. Before you read on, though, take a moment to work though the chapter 
quiz below to review the concepts we’ve covered. 


Test Your Knowledge: Quiz 


1. 


na BW N 


What two operator overloading methods can you use to support iteration in your 
classes? 


. What two operator overloading methods handle printing, and in what contexts? 
. How can you intercept slice operations in a class? 
. How can you catch in-place addition in a class? 


. When should you provide operator overloading? 


Test Your Knowledge: Answers 


1. 


Classes can support iteration by defining (or inheriting) _getitem__ or _iter_. 
In all iteration contexts, Python tries to use __iter__ (which returns an object that 
supports the iteration protocol with a __next__ method) first: if no __iter__ is 
found by inheritance search, Python falls back onthe __getitem__ indexing method 
(which is called repeatedly, with successively higher indexes). 


. The __str_ and __repr__ methods implement object print displays. The former is 


called by the print and str built-in functions; the latter is called by print and str 
ifthereisno__str_,and always by the repr built-in, interactive echoes, and nested 
appearances. That is, repr is used everywhere, except by print and str when 
a _str_ is defined. A _str_ is usually used for user-friendly displays; 
__repr__ gives extra details or the object’s as-code form. 


. Slicing is caught by the _ getitem_ indexing method: it is called with a slice object, 


instead of a simple index. In Python 2.6, _getslice__ (defunct in 3.0) may be used 
as well. 


. In-place addition tries __iadd__ first, and _add__ with an assignment second. The 


same pattern holds true for all binary operators. The _radd__ method is also avail- 
able for right-side addition. 
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5. When a class naturally matches, or needs to emulate, a built-in type’s interfaces. 
For example, collections might imitate sequence or mapping interfaces. You gen- 
erally shouldn’t implement expression operators if they don’t naturally map to 
your objects, though—use normally named methods instead. 
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CHAPTER 30 
Designing with Classes 


So far in this part of the book, we’ve concentrated on using Python’s OOP tool, the 
class. But OOP is also about design issues—i.e., how to use classes to model useful 
objects. This chapter will touch on a few core OOP ideas and present some additional 
examples that are more realistic than those shown so far. 


Along the way, we’ll code some common OOP design patterns in Python, such as 
inheritance, composition, delegation, and factories. We’ll also investigate some design- 
focused class concepts, such as pseudoprivate attributes, multiple inheritance, and 
bound methods. Many of the design terms mentioned here require more explanation 
than I can provide in this book; if this material sparks your curiosity, I suggest exploring 
a text on OOP design or design patterns as a next step. 


Python and OOP 


Let’s begin with a review—Python’s implementation of OOP can be summarized by 
three ideas: 


Inheritance 
Inheritance is based on attribute lookup in Python (in X.name expressions). 


Polymorphism 

In X.method, the meaning of method depends on the type (class) of X. 
Encapsulation 

Methods and operators implement behavior; data hiding is a convention by default. 


By now, you should have a good feel for what inheritance is all about in Python. We’ve 
also talked about Python’s polymorphism a few times already; it flows from Python’s 
lack of type declarations. Because attributes are always resolved at runtime, objects that 
implement the same interfaces are interchangeable; clients don’t need to know what 
sorts of objects are implementing the methods they call. 
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Encapsulation means packaging in Python—that is, hiding implementation details be- 
hind an object’s interface. It does not mean enforced privacy, though that can be 
implemented with code, as we’ll see in Chapter 38. Encapsulation allows the imple- 
mentation of an object’s interface to be changed without impacting the users of that 
object. 


Overloading by Call Signatures (or Not) 


Some OOP languages also define polymorphism to mean overloading functions based 
on the type signatures of their arguments. But because there are no type declarations 
in Python, this concept doesn’t really apply; polymorphism in Python is based on object 
interfaces, not types. 


You can try to overload methods by their argument lists, like this: 


class C: 
def meth(self, x): 


def meth(self, x, y, z): 


This code will run, but because the def simply assigns an object to a name in the class’s 
scope, the last definition of the method function is the only one that will be retained 
(it’s just as if you say X = 1 and then X = 2; X will be 2). 


Type-based selections can always be coded using the type-testing ideas we met in 
Chapters 4 and 9, or the argument list tools introduced in Chapter 18: 
class C: 


def meth(self, *args): 
if len(args) == 1: 


elif type(arg[o]) == int: 


You normally shouldn’t do this, though—as described in Chapter 16, you should write 

your code to expect an object interface, not a specific data type. That way, it will be 

useful for a broader category of types and applications, both now and in the future: 
class C: 


def meth(self, x): 
x.operation() # Assume x does the right thing 


It’s also generally considered better to use distinct method names for distinct opera- 
tions, rather than relying on call signatures (no matter what language you code in). 


Although Python’s object model is straightforward, much of the art in OOP is in the 
way we combine classes to achieve a program’s goals. The next section begins a tour 
of some of the ways larger programs use classes to their advantage. 
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OOP and Inheritance: “Is-a” Relationships 


We’ve explored the mechanics of inheritance in depth already, but I’d like to show you 
an example of how it can be used to model real-world relationships. From a program- 
mer’s point of view, inheritance is kicked off by attribute qualifications, which trigger 
searches for names in instances, their classes, and then any superclasses. From a de- 
signer’s point of view, inheritance is a way to specify set membership: a class defines a 
set of properties that may be inherited and customized by more specific sets (i.e., 
subclasses). 


To illustrate, let’s put that pizza-making robot we talked about at the start of this part 
of the book to work. Suppose we’ve decided to explore alternative career paths and 
open a pizza restaurant. One of the first things we’ll need to do is hire employees to 
serve customers, prepare the food, and so on. Being engineers at heart, we’ve decided 
to build a robot to make the pizzas; but being politically and cybernetically correct, 
we've also decided to make our robot a full-fledged employee with a salary. 


Our pizza shop team can be defined by the four classes in the example file, 
employees.py. The most general class, Employee, provides common behavior such as 
bumping up salaries (giveRaise) and printing (__repr__). There are two kinds of em- 
ployees, and so two subclasses of Employee: Chef and Server. Both override the inherited 
work method to print more specific messages. Finally, our pizza robot is modeled by an 
even more specific class: PizzaRobot is a kind of Chef, which is a kind of Employee. In 
OOP terms, we call these relationships “is-a” links: a robot is a chef, which is a(n) 
employee. Here’s the employees.py file: 


class Employee: 
def init__(self, name, salary=0): 
self.name = name 
self.salary = salary 
def giveRaise(self, percent): 
self.salary = self.salary + (self.salary * percent) 
def work(self): 
print(self.name, "does stuff") 
def _repr_ (self): 
return "<Employee: name=%s, salary=%s>" % (self.name, self.salary) 


class Chef(Employee): 
def init__(self, name): 
Employee. init__(self, name, 50000) 
def work(self): 
print(self.name, "makes food") 


class Server(Employee) : 
def _ init__(self, name): 
Employee. init__(self, name, 40000) 
def work(self): 
print(self.name, "interfaces with customer") 


class PizzaRobot (Chef): 
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def _ init__(self, name): 
Chef. init__(self, name) 
def work(self): 
print(self.name, "makes pizza") 


" " 


if name == "_ main_": 
bob = PizzaRobot('bob') # Make a robot named bob 
print (bob) # Run inherited __repr__ 
bob.work() # Run type-specific action 
bob. giveRaise(0.20) # Give bob a 20% raise 


print(bob); print() 


for klass in Employee, Chef, Server, PizzaRobot: 
obj = klass(klass.__name_) 
obj .work() 


When we run the self-test code included in this module, we create a pizza-making robot 
named bob, which inherits names from three classes: PizzaRobot, Chef, and Employee. 
For instance, printing bob runs the Employee.__repr__ method, and giving bob a raise 
invokes Employee.giveRaise because that’s where the inheritance search finds that 
method: 

C:\python\examples> python employees.py 

<Employee: name=bob, salary=50000> 


bob makes pizza 
<Employee: name=bob, salary=60000.0> 


Employee does stuff 

Chef makes food 

Server interfaces with customer 
PizzaRobot makes pizza 


In a class hierarchy like this, you can usually make instances of any of the classes, not 
just the ones at the bottom. For instance, the for loop in this module’s self-test code 
creates instances of all four classes; each responds differently when asked to work be- 
cause the work method is different in each. Really, these classes just simulate real-world 
objects; work prints a message for the time being, but it could be expanded to do real 
work later. 


OOP and Composition: “Has-a” Relationships 


The notion of composition was introduced in Chapter 25. From a programmer’s point 
of view, composition involves embedding other objects in a container object, and ac- 
tivating them to implement container methods. To a designer, composition is another 
way to represent relationships in a problem domain. But, rather than set membership, 
composition has to do with components—parts of a whole. 


Composition also reflects the relationships between parts, called a “has-a” relation- 
ships. Some OOP design texts refer to composition as aggregation (or distinguish be- 
tween the two terms by using aggregation to describe a weaker dependency between 
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container and contained); in this text, a “composition” simply refers to a collection of 
embedded objects. The composite class generally provides an interface all its own and 
implements it by directing the embedded objects. 


Now that we’ve implemented our employees, let’s put them in the pizza shop and let 
them get busy. Our pizza shop isa composite object: it has an oven, and it has employees 
like servers and chefs. When a customer enters and places an order, the components 
of the shop spring into action—the server takes the order, the chef makes the pizza, 
and so on. The following example (the file pizzashop.py) simulates all the objects and 
relationships in this scenario: 


from employees import PizzaRobot, Server 


class Customer: 
def init__(self, name): 
self.name = name 
def order(self, server): 
print(self.name, "orders from", server) 
def pay(self, server): 
print(self.name, "pays for item to", server) 


class Oven: 
def bake(self): 
print("oven bakes") 


class PizzaShop: 
def _ init__(self): 


self.server = Server('Pat') # Embed other objects 
self.chef = PizzaRobot('Bob') # A robot named bob 
self.oven = Oven() 


def order(self, name): 
customer = Customer (name) # Activate other objects 
customer .order(self.server) # Customer orders from server 
self.chef.work() 
self.oven.bake() 
customer .pay(self.server) 


if _name_ == "_ main_": 
scene = PizzaShop() # Make the composite 
scene.order('Homer' ) # Simulate Homer's order 
print('...') 
scene. order ('Shaggy' ) # Simulate Shaggy's order 


The PizzaShop class is a container and controller; its constructor makes and embeds 
instances of the employee classes we wrote in the last section, as well as an Oven class 
defined here. When this module’s self-test code calls the PizzaShop order method, the 
embedded objects are asked to carry out their actions in turn. Notice that we make a 
new Customer object for each order, and we pass on the embedded Server object to 
Customer methods; customers come and go, but the server is part of the pizza shop 
composite. Also notice that employees are still involved in an inheritance relationship; 
composition and inheritance are complementary tools. 
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When we run this module, our pizza shop handles two orders—one from Homer, and 
then one from Shaggy: 


C:\python\examples> python pizzashop.py 

Homer orders from <Employee: name=Pat, salary=40000> 

Bob makes pizza 

oven bakes 

Homer pays for item to <Employee: name=Pat, salary=40000> 


Shaggy orders from <Employee: name=Pat, salary=40000> 

Bob makes pizza 

oven bakes 

Shaggy pays for item to <Employee: name=Pat, salary=40000> 


Again, this is mostly just a toy simulation, but the objects and interactions are repre- 
sentative of composites at work. As a rule of thumb, classes can represent just about 
any objects and relationships you can express in a sentence; just replace nouns with 
classes, and verbs with methods, and you'll have a first cut at a design. 


Stream Processors Revisited 


For a more realistic composition example, recall the generic data stream processor 
function we partially coded in the introduction to OOP in Chapter 25: 


def processor(reader, converter, writer): 
while 1: 
data = reader.read() 
if not data: break 
data = converter(data) 
writer.write(data) 


Rather than using a simple function here, we might code this as a class that uses com- 
position to do its work to provide more structure and support inheritance. The fol- 
lowing file, streams.py, demonstrates one way to code the class: 


class Processor: 
def init__(self, reader, writer): 
self.reader = reader 
self.writer = writer 
def process(self): 
while 1: 
data = self.reader.readline() 
if not data: break 
data = self.converter(data) 
self .writer.write(data) 
def converter(self, data): 
assert False, ‘converter must be defined’ # Or raise exception 


This class defines a converter method that it expects subclasses to fill in; it’s an example 
of the abstract superclass model we outlined in Chapter 28 (more on assert in 
Part VII). Coded this way, reader and writer objects are embedded within the class 
instance (composition), and we supply the conversion logic in a subclass rather than 
passing in a converter function (inheritance). The file converters.py shows how: 
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from streams import Processor 


class Uppercase(Processor): 
def converter(self, data): 
return data.upper() 
if _name__ == '' _ main_': 
import sys 
obj = Uppercase(open('spam.txt'), sys.stdout) 
obj.process() 


Here, the Uppercase class inherits the stream-processing loop logic (and anything else 
that may be coded in its superclasses). It needs to define only what is unique about it— 
the data conversion logic. When this file is run, it makes and runs an instance that reads 
from the file spam.txt and writes the uppercase equivalent of that file to the stdout 


stream: 


C:\lp4e> type spam.txt 
spam 
Spam 
SPAM! 


C:\lp4e> python converters.py 
SPAM 
SPAM 
SPAM! 


To process different sorts of streams, pass in different sorts of objects to the class con- 


struction call. Here, we use an output file instead of a stream: 


C:\lp4e> python 

>>> import converters 

>>> prog = converters.Uppercase(open('spam.txt'), open('spamup.txt', 'w')) 
>>> prog.process() 


C:\lp4e> type spamup.txt 
SPAM 
SPAM 
SPAM! 


But, as suggested earlier, we could also pass in arbitrary objects wrapped up in classes 
that define the required input and output method interfaces. Here’s a simple example 


that passes in a writer class that wraps up the text inside HTML tags: 


C:\lp4e> python 
>>> from converters import Uppercase 
>>> 
>>> class HTMLize: 
def write(self, line): 
print('<PRE>%s</PRE>' % line.rstrip()) 


>>> Uppercase(open('spam.txt'), HTMLize()).process() 
<PRE>SPAM</PRE> 

<PRE>SPAM</PRE> 

<PRE>SPAM! </PRE> 
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If you trace through this example’s control flow, you’ll see that we get both uppercase 
conversion (by inheritance) and HTML formatting (by composition), even though the 
core processing logic in the original Processor superclass knows nothing about either 
step. The processing code only cares that writers have awrite method and that a method 
named convert is defined; it doesn’t care what those methods do when they are called. 
Such polymorphism and encapsulation of logic is behind much of the power of classes. 


As is, the Processor superclass only provides a file-scanning loop. In more realistic 
work, we might extend it to support additional programming tools for its subclasses, 
and, in the process, turn it into a full-blown framework. Coding such a tool once in a 
superclass enables you to reuse it in all of your programs. Even in this simple example, 
because so much is packaged and inherited with classes, all we had to code was the 
HTML formatting step; the rest was free. 


For another example of composition at work, see exercise 9 at the end of Chapter 31 
and its solution in Appendix B; it’s similar to the pizza shop example. We’ve focused 
on inheritance in this book because that is the main tool that the Python language itself 
provides for OOP. But, in practice, composition is used as much as inheritance as a 
way to structure classes, especially in larger systems. As we’ve seen, inheritance and 
composition are often complementary (and sometimes alternative) techniques. Because 
composition is a design issue outside the scope of the Python language and this book, 
though, Ill defer to other resources for more on this topic. 


Why You Will Care: Classes and Persistence 


Pve mentioned Python’s pickle and shelve object persistence support a few times in 
this part of the book because it works especially well with class instances. In fact, these 
tools are often compelling enough to motivate the use of classes in general—by picking 
or shelving a class instance, we get data storage that contains both data and logic 
combined. 


For example, besides allowing us to simulate real-world interactions, the pizza shop 
classes developed in this chapter could also be used as the basis of a persistent restaurant 
database. Instances of classes can be stored away on disk in a single step using Python’s 
pickle or shelve modules. We used shelves to store instances of classes in the OOP 
tutorial in Chapter 27, but the object pickling interface is remarkably easy to use as well: 


import pickle 
object = someClass() 


file = open(filename, 'wb') # Create external file 
pickle.dump(object, file) # Save object in file 

import pickle 

file = open(filename, 'rb') 

object = pickle.load(file) # Fetch it back later 
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Pickling converts in-memory objects to serialized byte streams (really, strings), which 
may be stored in files, sent across a network, and so on; unpickling converts back from 
byte streams to identical in-memory objects. Shelves are similar, but they automatically 
pickle objects to an access-by-key database, which exports a dictionary-like interface: 


import shelve 

object = someClass() 

dbase = shelve.open('filename' ) 

dbase['key'] = object # Save under key 


import shelve 
dbase = shelve.open('filename' ) 
object = dbase['key'] # Fetch it back later 


In our pizza shop example, using classes to model employees means we can get a simple 
database of employees and shops with little extra work—pickling such instance objects 
to a file makes them persistent across Python program executions: 


>>> from pizzashop import PizzaShop 

>>> shop = PizzaShop() 

>>> shop.server, shop.chef 

(<Employee: name=Pat, salary=40000>, <Employee: name=Bob, salary=50000>) 
>>> import pickle 

>>> pickle.dump(shop, open('shopfile.dat', 'wb')) 


This stores an entire composite shop object in a file all at once. To bring it back later in 
another session or program, a single step suffices as well. In fact, objects restored this 
way retain both state and behavior: 


>>> import pickle 

>>> obj = pickle.load(open('shopfile.dat', 'rb')) 

>>> obj.server, obj.chef 

(<Employee: name=Pat, salary=40000>, <Employee: name=Bob, salary=50000>) 
>>> obj.order('Sue') 

Sue orders from <Employee: name=Pat, salary=40000> 

Bob makes pizza 

oven bakes 

Sue pays for item to <Employee: name=Pat, salary=40000> 


See the standard library manual and later examples for more on pickles and shelves. 


OOP and Delegation: “Wrapper” Objects 


Beside inheritance and composition, object-oriented programmers often also talk about 
something called delegation, which usually implies controller objects that embed other 
objects to which they pass off operation requests. The controllers can take care of 
administrative activities, such as keeping track of accesses and so on. In Python, dele- 
gation is often implemented with the _ getattr__ method hook; because it intercepts 
accesses to nonexistent attributes, a wrapper class (sometimes called a proxy class) can 
use _ getattr_ to route arbitrary accesses to a wrapped object. The wrapper class 
retains the interface of the wrapped object and may add additional operations of its 
own. 
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Consider the file trace.py, for instance: 


class wrapper: 
def init__(self, object): 


self.wrapped = object # Save object 
def _ getattr_(self, attrname): 
print('Trace:', attrname) # Trace fetch 


return getattr(self.wrapped, attrname) # Delegate fetch 


Recall from Chapter 29 that _ getattr__ gets the attribute name as a string. This code 
makes use of the getattr built-in function to fetch an attribute from the wrapped object 
by name string—getattr(X,N) is like X.N, except that N is an expression that evaluates 
to a string at runtime, not a variable. In fact, getattr(X,N) is similar toX.__dict__[N], 
but the former also performs an inheritance search, like X.N, while the latter does not 
(see “Namespace Dictionaries” on page 696 for more on the _ dict__ attribute). 


You can use the approach of this module’s wrapper class to manage access to any object 
with attributes—lists, dictionaries, and even classes and instances. Here, the wrapper 
class simply prints a trace message on each attribute access and delegates the attribute 
request to the embedded wrapped object: 


>>> from trace import wrapper 


>>> x = wrapper([1,2,3]) # Wrap a list 

>>> x.append(4) # Delegate to list method 
Trace: append 

>>> X.Wwrapped # Print my member 

[1, 2, 3, 4] 

>>> x = wrapper({"a": 1, "b": 2}) # Wrap a dictionary 

>>> x.keys() # Delegate to dictionary method 


Trace: keys 

[ ' a Li r i b' ] 
The net effect is to augment the entire interface of the wrapped object, with additional 
code in the wrapper class. We can use this to log our method calls, route method calls 
to extra or custom logic, and so on. 


We'll revive the notions of wrapped objects and delegated operations as one way to 
extend built-in types in Chapter 31. If you are interested in the delegation design pat- 
tern, also watch for the discussions in Chapters 31 and 38 of function decorators, a 
strongly related concept designed to augment a specific function or method call rather 
than the entire interface of an object, and class decorators, which serve as a way to 
automatically add such delegation-based wrappers to all instances of a class. 
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Version skew note: In Python 2.6, operator overloading methods run by 
built-in operations are routed through generic attribute interception 
methods like _ getattr_. Printing a wrapped object directly, for ex- 
ample, calls this method for__repr__ or__str_, which then passes the 
call on to the wrapped object. In Python 3.0, this no longer happens: 
printing does not trigger __ getattr__, and a default display is used in- 
stead. In 3.0, new-style classes look up operator overloading methods 
in classes and skip the normal instance lookup entirely. We’ll return to 
this issue in Chapter 37, in the context of managed attributes; for now, 
keep in mind that you may need to redefine operator overloading meth- 
ods in wrapper classes (either by hand, by tools, or by superclasses) if 
you want them to be intercepted in 3.0. 


Pseudoprivate Class Attributes 


Besides larger structuring goals, class designs often must address name usage too. In 
Part V, we learned that every name assigned at the top level of a module file is exported. 
By default, the same holds for classes—data hiding is a convention, and clients may 
fetch or change any class or instance attribute they like. In fact, attributes are all “pub- 
lic” and “virtual,” in C++ terms; they’re all accessible everywhere and are looked up 
dynamically at runtime.” 


That said, Python today does support the notion of name “mangling” (i.e., expansion) 
to localize some names in classes. Mangled names are sometimes misleadingly called 
“private attributes,” but really this is just a way to localize a name to the class that 
created it—name mangling does not prevent access by code outside the class. This 
feature is mostly intended to avoid namespace collisions in instances, not to restrict 
access to names in general; mangled names are therefore better called “pseudoprivate” 
than “private.” 


Pseudoprivate names are an advanced and entirely optional feature, and you probably 
won't find them very useful until you start writing general tools or larger class hierar- 
chies for use in multiprogrammer projects. In fact, they are not always used even when 
they probably should be—more commonly, Python programmers code internal names 
with a single underscore (e.g.,_X), which is just an informal convention to let you know 
that a name shouldn’t be changed (it means nothing to Python itself). 


Because you may see this feature in other people’s code, though, you need to be some- 
what aware of it, even if you don’t use it yourself. 


* This tends to scare people with a C++ background unnecessarily. In Python, it’s even possible to change or 
completely delete a class method at runtime. On the other hand, almost nobody ever does this in practical 
programs. As a scripting language, Python is more about enabling than restricting. Also, recall from our 
discussion of operator overloading in Chapter 29 that _getattr__ and __setattr_ can be used to emulate 
privacy, but are generally not used for this purpose in practice. More on this when we code a more realistic 
privacy decorator Chapter 38. 
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Name Mangling Overview 


Here’s how name mangling works: names inside a class statement that start with two 
underscores but don’t end with two underscores are automatically expanded to include 
the name of the enclosing class. For instance, a name like __X within a class named 
Spam is changed to _Spam_X automatically: the original name is prefixed with a single 
underscore and the enclosing class’s name. Because the modified name contains the 
name of the enclosing class, it’s somewhat unique; it won’t clash with similar names 
created by other classes in a hierarchy. 


Name mangling happens only in class statements, and only for names that begin with 
two leading underscores. However, it happens for every name preceded with double 
underscores—both class attributes (like method names) and instance attribute names 
assigned to self attributes. For example, in a class named Spam, a method named 
__methis mangled to _Spam_ meth, and an instance attribute reference self. __X is trans- 
formed to self. Spam_X. Because more than one class may add attributes to an in- 
stance, this mangling helps avoid clashes—but we need to move on to an example to 
see how. 


Why Use Pseudoprivate Attributes? 


One of the main problems that the pseudoprivate attribute feature is meant to alleviate 
has to do with the way instance attributes are stored. In Python, all instance attributes 
wind up in the single instance object at the bottom of the class tree. This is different 
from the C++ model, where each class gets its own space for data members it defines. 


Within a class method in Python, whenever a method assigns to a self attribute (e.g., 
self.attr = value), it changes or creates an attribute in the instance (inheritance 
searches happen only on reference, not on assignment). Because this is true even if 
multiple classes in a hierarchy assign to the same attribute, collisions are possible. 


For example, suppose that when a programmer codes a class, she assumes that she 
owns the attribute name X in the instance. In this class’s methods, the name is set, and 
later fetched: 

class C1: 


def methi(self): self.X = 88 # I assume X is mine 
def meth2(self): print(self.x) 


Suppose further that another programmer, working in isolation, makes the same as- 
sumption in a class that he codes: 


class (2: 
def metha(self): self.X = 99 # Me too 
def methb(self): print(self.X) 


Both of these classes work by themselves. The problem arises if the two classes are ever 
mixed together in the same class tree: 
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class €3(C1, C2): ... 
I = €3() # Only 1 X in I! 


Now, the value that each class gets back when it says self .X will depend on which class 
assigned it last. Because all assignments to self.X refer to the same single instance, 
there is only one X attribute—I .X—no matter how many classes use that attribute name. 


To guarantee that an attribute belongs to the class that uses it, prefix the name with 
double underscores everywhere it is used in the class, as in this file, private.py: 


class C1: 

def methi(self): self. X = 88 # Now X is mine 

def meth2(self): print(self. X) # Becomes _C1__X inI 
class (2: 

def metha(self): self. X = 99 # Me too 


def methb(self): print(self.__X) # Becomes _C2__X inI 


class C3(C1, C2): pass 
I = €3() # Two X names in I 


I.meth1(); I.metha() 

print(I._dict_) 

I.meth2(); I.methb() 
When thus prefixed, the X attributes will be expanded to include the names of their 
classes before being added to the instance. If you run a dir call on I or inspect its 
namespace dictionary after the attributes have been assigned, you’ll see the expanded 
names, C1_Xand_(2_X, but not X. Because the expansion makes the names unique 
within the instance, the class coders can safely assume that they truly own any names 
that they prefix with two underscores: 

% python private.py 

{'_C2_X': 99, '_C1_X': 88} 

88 

99 


This trick can avoid potential name collisions in the instance, but note that it does not 
amount to true privacy. If you know the name of the enclosing class, you can still access 
either of these attributes anywhere you have a reference to the instance by using the 
fully expanded name (e.g., I._C1__X = 77). On the other hand, this feature makes it 
less likely that you will accidentally step on a class’s names. 


Pseudoprivate attributes are also useful in larger frameworks or tools, both to avoid 
introducing new method names that might accidentally hide definitions elsewhere in 
the class tree and to reduce the chance of internal methods being replaced by names 
defined lower in the tree. If a method is intended for use only within a class that may 
be mixed into other classes, the double underscore prefix ensures that the method won’t 
interfere with other names in the tree, especially in multiple-inheritance scenarios: 


class Super: 
def method(self): ... # A real application method 


class Tool: 
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def __method(self): ... # Becomes _Tool__method 
def other(self): self. __method() # Use my internal method 


class Sub1(Tool, Super): ... 
def actions(self): self.method() # Runs Super.method as expected 


class Sub2(Tool): 
def init__(self): self.method = 99 # Doesn't break Tool.__method 


We met multiple inheritance briefly in Chapter 25 and will explore it in more detail 
later in this chapter. Recall that superclasses are searched according to their left-to-right 
order in class header lines. Here, this means Sub1 prefers Tool attributes to those in 
Super. Although in this example we could force Python to pick the application class’s 
methods first by switching the order of the superclasses listed in the Sub1 class header, 
pseudoprivate attributes resolve the issue altogether. Pseudoprivate names also prevent 
subclasses from accidentally redefining the internal method’s names, as in Sub2. 


Again, I should note that this feature tends to be of use primarily for larger, 
multiprogrammer projects, and then only for selected names. Don’t be tempted to 
clutter your code unnecessarily; only use this feature for names that truly need to be 
controlled by a single class. For simpler programs, it’s probably overkill. 


For more examples that make use of the _X naming feature, see the lister.py 
mix-in classes introduced later in this chapter, in the section on multiple inheritance, 
as well as the discussion of Private class decorators in Chapter 38. If you 
care about privacy in general, you might want to review the emulation of 
private instance attributes sketched in the section “Attribute Reference: __getattr__ 
and __setattr__” on page 718 in Chapter 29, and watch for the Private class decorator 
in Chapter 38 that we will base upon this special method. Although it’s possible to 
emulate true access controls in Python classes, this is rarely done in practice, even for 
large systems. 


Methods Are Objects: Bound or Unbound 


Methods in general, and bound methods in particular, simplify the implementation of 
many design goals in Python. We met bound methods briefly while studying _call__in 
Chapter 29. The full story, which we’ll flesh out here, turns out to be more general and 
flexible than you might expect. 


In Chapter 19, we learned how functions can be processed as normal objects. Methods 
are a kind of object too, and can be used generically in much the same way as other 
objects—they can be assigned, passed to functions, stored in data structures, and so 
on. Because class methods can be accessed from an instance or a class, though, they 
actually come in two flavors in Python: 
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Unbound class method objects: no self 
Accessing a function attribute of a class by qualifying the class returns an unbound 
method object. To call the method, you must provide an instance object explicitly 
as the first argument. In Python 3.0, an unbound method is the same as a simple 
function and can be called though the class’s name; in 2.6 it’s a distinct type and 
cannot be called without providing an instance. 


Bound instance method objects: self + function pairs 
Accessing a function attribute of a class by qualifying an instance returns a bound 
method object. Python automatically packages the instance with the function in 
the bound method object, so you don’t need to pass an instance to call the method. 


Both kinds of methods are full-fledged objects; they can be transferred around a pro- 
gram at will, just like strings and numbers. Both also require an instance in their first 
argument when run (i.e., a value for self). This is why we had to pass in an instance 
explicitly when calling superclass methods from subclass methods in the previous 
chapter; technically, such calls produce unbound method objects. 


When calling a bound method object, Python provides an instance for you automati- 
cally—the instance used to create the bound method object. This means that bound 
method objects are usually interchangeable with simple function objects, and makes 
them especially useful for interfaces originally written for functions (see the sidebar 
“Why You Will Care: Bound Methods and Callbacks” on page 756 for a realistic 
example). 


To illustrate, suppose we define the following class: 


class Spam: 
def doit(self, message): 
print (message) 

Now, in normal operation, we make an instance and call its method in a single step to 
print the passed-in argument: 

object1 = Spam() 

object1.doit(‘hello world') 
Really, though, a bound method object is generated along the way, just before the 
method call’s parentheses. In fact, we can fetch a bound method without actually call- 
ing it. An object.name qualification is an object expression. In the following, it returns 
a bound method object that packages the instance (object1) with the method function 
(Spam.doit). We can assign this bound method pair to another name and then call it as 
though it were a simple function: 

object1 = Spam() 


x = object1.doit # Bound method object: instance+function 
x('hello world’) # Same effect as object1.doit('...') 
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On the other hand, if we qualify the class to get to doit, we get back an unbound method 
object, which is simply a reference to the function object. To call this type of method, 
we must pass in an instance as the leftmost argument: 

object1 = Spam() 

t = Spam.doit # Unbound method object (a function in 3.0: see ahead) 

t(object1, 'howdy') # Pass in instance (if the method expects one in 3.0) 


By extension, the same rules apply within a class’s method if we reference self attributes 
that refer to functions in the class. A self .method expression is a bound method object 
because self is an instance object: 


class Eggs: 
def m1(self, n): 
print(n) 
def m2(self): 
x = self.m1 # Another bound method object 
x(42) # Looks like a simple function 
Eggs().m2() # Prints 42 


Most of the time, you call methods immediately after fetching them with attribute 
qualification, so you don’t always notice the method objects generated along the way. 
But if you start writing code that calls objects generically, you need to be careful to treat 
unbound methods specially—they normally require an explicit instance object to be 
passed in. 


Unbound Methods are Functions in 3.0 


In Python 3.0, the language has dropped the notion of unbound methods. What we 
describe as an unbound method here is treated as a simple function in 3.0. For most 
purposes, this makes no difference to your code; either way, an instance will be passed 
to a method’s first argument when it’s called through an instance. 


Programs that do explicit type testing might be impacted, though—f you print the type 
of an instance-less class method, it displays “unbound method” in 2.6, and “function” 
in 3.0. 


Moreover, in 3.0 it is OK to call a method without an instance, as long as the method 
does not expect one and you callit only through the class and never through an instance. 
That is, Python 3.0 will pass along an instance to methods only for through-instance 
calls. When calling through a class, you must pass an instance manually only if the 
method expects one: 


C:\misc> c:\python30\python 
>>> class Selfless: 


t+ See the discussion of static and class methods in Chapter 31 for an optional exception to this rule. Like bound 
methods, static methods can masquerade as basic functions because they do not expect instances when called. 
Python supports three kinds of class methods—instance, static, and class—and 3.0 allows simple functions 
in classes, too. 
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def init__(self, data): 
self.data = data 


def selfless(arg1, arg2): # A simple function in 3.0 
return arg1 + arg2 
def normal(self, arg1, arg2): # Instance expected when called 


return self.data + arg1 + arg2 


>>> X = Selfless(2) 


>>> X.normal(3, 4) # Instance passed to self automatically 

9 

>>> Selfless.normal(X, 3, 4) # self expected by method: pass manually 
9 

>>> Selfless.selfless(3, 4) # No instance: works in 3.0, fails in 2.6! 
7 


The last test in this fails in 2.6, because unbound methods require an instance to be 
passed by default; it works in 3.0 because such methods are treated as simple functions 
not requiring an instance. Although this removes some potential error trapping in 3.0 
(what ifa programmer accidentally forgets to pass an instance?), it allows class methods 
to be used as simple functions as long as they are not passed and do not expect a “self” 
instance argument. 


The following two calls still fail in both 3.0 and 2.6, though—the first (calling through 
an instance) automatically passes an instance to a method that does not expect one, 
while the second (calling through a class) does not pass an instance to a method that 
does expect one: 


>>> X.selfless(3, 4) 
TypeError: selfless() takes exactly 2 positional arguments (3 given) 


>>> Selfless.normal(3, 4) 
TypeError: normal() takes exactly 3 positional arguments (2 given) 


Because of this change, the staticmethod decorator described in the next chapter is not 
needed in 3.0 for methods without a self argument that are called only through the 
class name, and never through an instance—such methods are run as simple functions, 
without receiving an instance argument. In 2.6, such calls are errors unless an instance 
is passed manually (more on static methods in the next chapter). 


It’s important to be aware of the differences in behavior in 3.0, but bound methods are 
generally more important from a practical perspective anyway. Because they pair to- 
gether the instance and function in a single object, they can be treated as callables 
generically. The next section demonstrates what this means in code. 


Wa 

4 
SS For a more visual illustration of unbound method treatment in Python 
43 3.0 and 2.6, see also the lister.py example in the multiple inheritance 
~~ Wò section later in this chapter. Its classes print the value of methods fetched 


from both instances and classes, in both versions of Python. 
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Bound Methods and Other Callable Objects 


As mentioned earlier, bound methods can be processed as generic objects, just like 
simple functions—they can be passed around a program arbitrarily. Moreover, because 
bound methods combine both a function and an instance in a single package, they can 
be treated like any other callable object and require no special syntax when invoked. 
The following, for example, stores four bound method objects in a list and calls them 
later with normal call expressions: 
>>> class Number: 
def init__(self, base): 
self.base = base 
def double(self): 
return self.base * 2 


def triple(self): 
return self.base * 3 


x = Number(2) # Class instance objects 
>>> y = Number (3) # State + methods 
>>> z = Number(4) 
>>> x.double() # Normal immediate calls 
4 
>>> acts = [x.double, y.double, y.triple, z.double] # List of bound methods 
>>> for act in acts: # Calls are deferred 

print(act()) # Call as though functions 

4 
6 
9 
8 


Like simple functions, bound method objects have introspection information of their 
own, including attributes that give access to the instance object and method function 
they pair. Calling the bound method simply dispatches the pair: 


>>> bound = x.double 

>>> bound. __self__, bound. func__ 

(<__main__.Number object at 0x0278F610>, <function double at 0x027A4EDO>) 
>>> bound.__self__.base 

2 

>>> bound() # Calls bound.__func__(bound.__self_, ...) 
4 


In fact, bound methods are just one of a handful of callable object types in Python. As 
the following demonstrates, simple functions coded with a def or lambda, instances that 
inherit a__call__, and bound instance methods can all be treated and called the same 
way: 
>>> def square(arg): 
return arg ** 2 # Simple functions (def or lambda) 


>>> class Sum: 
def init__(self, val): # Callable instances 
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self.val = val 
def _call__ (self, arg): 
return self.val + arg 


>>> class Product: 
def _ init_ (self, val): # Bound methods 
self.val = val 
def method(self, arg): 
return self.val * arg 


>>> sobject = Sum(2) 
>>> pobject = Product(3) 
>>> actions = [square, sobject, pobject.method] # Function, instance, method 


>>> for act in actions: # All 3 called same way 
print(act(5)) # Call any 1-arg callable 

25 

7 

15 

>>> actions[-1](5) # Index, comprehensions, maps 

15 

>>> [act(5) for act in actions] 

[25, 7, 15] 

>>> list(map(lambda act: act(5), actions)) 

[25, 7, 15] 


Technically speaking, classes belong in the callable objects category too, but we nor- 
mally call them to generate instances rather than to do actual work, as shown here: 


>>> class Negate: 


def _ init_ (self, val): # Classes are callables too 
self.val = -val # But called for object, not work 
def _repr_ (self): # Instance print format 


return str(self.val) 


>>> actions = [square, sobject, pobject.method, Negate] # Call a class too 
>>> for act in actions: 
print(act(5)) 
25 
7 
15 
>>> [act(5) for act in actions] # Runs __repr__not __str_! 
[25, 7, 15, -5] 


>>> table = {act(5): act for act in actions} # 2.6/3.0 dict comprehension 
>>> for (key, value) in table.items(): 
print('{0:2} => {1}'.format(key, value) ) # 2.6/3.0 str.format 


-5 => <class '__main_.Negate'> 

25 => <function square at 0x025D4978> 

15 => <bound method Product.method of <_main__.Product object at 0x025D0F90>> 
7 => <__main__.Sum object at 0x025D0F70> 
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As you can see, bound methods, and Python’s callable objects model in general, are 
some of the many ways that Python’s design makes for an incredibly flexible language. 


You should now understand the method object model. For other examples of bound 
methods at work, see the upcoming sidebar “Why You Will Care: Bound Methods and 
Callbacks” as well as the prior chapter’s discussion of callback handlers in the section 
on the method __call_. 


Why You Will Care: Bound Methods and Callbacks 


Because bound methods automatically pair an instance with a class method function, 
you can use them anywhere a simple function is expected. One of the most common 
places you'll see this idea put to work is in code that registers methods as event callback 
handlers in the tkinter GUI interface (named Tkinter in Python 2.6). Here’s the simple 
case: 


def handler(): 
...Use globals for state... 


widget = Button(text='spam', command=handler) 


To register a handler for button click events, we usually pass a callable object that takes 
no arguments to the command keyword argument. Function names (and lambdas) work 
here, and so do class methods, as long as they are bound methods: 
class MyWidget: 
def handler(self): 
...use self.attr for state... 


def makewidgets(self): 
b = Button(text='spam', command=self.handler) 


Here, the event handler is self .handler—a bound method object that remembers both 
self and MyGui.handler. Because self will refer to the original instance when handler 
is later invoked on events, the method will have access to instance attributes that can 
retain state between events. With simple functions, state normally must be retained in 
global variables or enclosing function scopes instead. See also the discussion of 
__call__ operator overloading in Chapter 29 for another way to make classes compat- 
ible with function-based APIs. 


Multiple Inheritance: “Mix-in” Classes 


Many class-based designs call for combining disparate sets of methods. In a class 
statement, more than one superclass can be listed in parentheses in the header line. 
When you do this, you use something called multiple inheritance—the class and its 
instances inherit names from all the listed superclasses. 


756 | Chapter 30: Designing with Classes 


When searching for an attribute, Python’s inheritance search traverses all superclasses 
in the class header from left to right until a match is found. Technically, because any 
of the superclasses may have superclasses of its own, this search can be a bit more 
complex for larger class tress: 


e Inclassic classes (the default until Python 3.0), the attribute search proceeds depth- 
first all the way to the top of the inheritance tree, and then from left to right. 


e Innew-style classes (and all classes in 3.0), the attribute search proceeds across by 
tree levels, in a more breadth-first fashion (see the new-style class discussion in the 
next chapter). 


In either model, though, when a class has multiple superclasses, they are searched from 
left to right according to the order listed in the class statement header lines. 


In general, multiple inheritance is good for modeling objects that belong to more than 
one set. For instance, a person may be an engineer, a writer, a musician, and so on, and 
inherit properties from all such sets. With multiple inheritance, objects obtain the 
union of the behavior in all their superclasses. 


Perhaps the most common way multiple inheritance is used is to “mix in” general- 
purpose methods from superclasses. Such superclasses are usually called mix-in 
classes—they provide methods you add to application classes by inheritance. Ina sense, 
mix-in classes are similar to modules: they provide packages of methods for use in their 
client subclasses. Unlike simple functions in modules, though, methods in mix-ins also 
have access to the self instance, for using state information and other methods. The 
next section demonstrates one common use case for such tools. 


Coding Mix-in Display Classes 


As we’ve seen, Python’s default way to print a class instance object isn’t incredibly 
useful: 
>>> class Spam: 


def init__(self): # No __repr__or __str__ 
self.data1 = "food" 


>>> X = Spam() 
>>> print(X) # Default: class, address 
<__main__.Spam object at 0x00864818> # Displays "instance" in Python 2.6 


As you saw in Chapter 29 when studying operator overloading, you can provide a 
__str__or__repr__ method to implement a custom string representation of your own. 
But, rather than coding one of these in each and every class you wish to print, why not 
code it once in a general-purpose tool class and inherit it in all your classes? 


That’s what mix-ins are for. Defining a display method in a mix-in superclass once 
enables us to reuse it anywhere we want to see a custom display format. We’ve already 
seen tools that do related work: 
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e Chapter 27’s AttrDisplay class formatted instance attributes in a generic _str__ 
method, but it did not climb class trees and was used in single-inheritance mode 
only. 


e Chapter 28’s classtree.py module defined functions for climbing and sketching 
class trees, but it did not display object attributes along the way and was not ar- 
chitected as an inheritable class. 


Here, we’re going to revisit these examples’ techniques and expand upon them to code 
a set of three mix-in classes that serve as generic display tools for listing instance at- 
tributes, inherited attributes, and attributes on all objects in a class tree. We’ll also use 
our tools in multiple-inheritance mode and deploy coding techniques that make classes 
better suited to use as generic tools. 


Listing instance attributes with dict__ 


Let’s get started with the simple case—listing attributes attached to an instance. The 
following class, coded in the file lister.py, defines a mix-in called ListInstance that 
overloadsthe _str__ method forall classes that include it in their header lines. Because 
this is coded as a class, ListInstance is a generic tool whose formatting logic can be 
used for instances of any subclass: 


# File lister.py 


class ListInstance: 
Mix-in class that provides a formatted print() or str() of 
instances via inheritance of _ str__, coded here; displays 
instance attrs only; self is the instance of lowest class; 
uses _X names to avoid clashing with client's attrs 
def _str_ (self): 
return '<Instance of %s, address %s:\n%s>' % ( 


self._class_.__name_, # My class's name 
id(self), # My address 
self. _attrnames()) # name=value list 
def _attrnames(self): 
result = '' 
for attr in sorted(self._dict__): # Instance attr dict 


result += '\tname %s=%s\n' % (attr, self. dict_ [attr]) 
retubrn result 


ListInstance uses some previously explored tricks to extract the instance’s class name 
and attributes: 


e Eachinstancehasabuilt-in__class _ attribute that references the class from which 
it was created, and each class has a _name__ attribute that references the name in 
the header, so the expression self.__class__.__name__ fetches the name of an in- 
stance’s class. 
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e This class does most of its work by simply scanning the instance’s attribute dic- 
tionary (remember, it’s exported in _ dict__) to build up a string showing the 
names and values of all instance attributes. The dictionary’s keys are sorted to 
finesse any ordering differences across Python releases. 


In these respects, ListInstance is similar to Chapter 27’s attribute display; in fact, it’s 
largely just a variation on a theme. Our class here uses two additional techniques, 
though: 


e It displays the instance’s memory address by calling the id built-function, which 
returns any object’s address (by definition, a unique object identifier, which will 
be useful in later mutations of this code). 


e It uses the pseudoprivate naming pattern for its worker method: __attrnames. As 
we learned earlier in his chapter, Python automatically localizes any such name to 
its enclosing class by expanding the attribute name to include the class name (in 
this case, it becomes _ListInstance__attrnames). This holds true for both class 
attributes (like methods) and instance attributes attached to self. This behavior is 
useful in a general tool like this, as it ensures that its names don’t clash with any 
names used in its client subclasses. 


Because ListInstance defines a _str__ operator overloading method, instances de- 
rived from this class display their attributes automatically when printed, giving a bit 
more information than a simple address. Here is the class in action, in single-inheritance 
mode (this code works the same in both Python 3.0 and 2.6): 

>>> from lister import ListInstance 

>>> class Spam(ListInstance): # Inherit a ___str__ method 

def init__(self): 
self.data1 = 'food' 


>>> x = Spam() 
>>> print(x) # print() and str() run __str__ 
<Instance of Spam, address 40240880: 
name data1=food 
> 


You can also fetch the listing output as a string without printing it with str, and inter- 
active echoes still use the default format: 


>>> str(x) 
'<Instance of Spam, address 40240880:\n\tname data1=food\n>' 
>>> xX # The __repr__stillis a default 


<__main__.Spam object at 0x026606F0> 


The ListInstance class is useful for any classes you write—even classes that already 
have one or more superclasses. This is where multiple inheritance comes in handy: by 
adding ListInstance to the list of superclasses in a class header (i.e., mixing it in), you 
getits str “for free” while still inheriting from the existing superclass(es). The file 
testmixin.py demonstrates: 
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# File testmixin.py 
from lister import * # Get lister tool classes 


class Super: 


def _ init__(self): # Superclass __init__ 
self.datal = 'spam' # Create instance attrs 
def ham(self): 
pass 
class Sub(Super, ListInstance): # Mix in ham and a _str__ 
def _ init__(self): # listers have access to self 
Super. init__(self) 
self.data2 = 'eggs' # More instance attrs 
self.data3 = 42 
def spam(self): # Define another method here 
pass 
if _name__ == '' main_': 
X = Sub() 
print (X) # Run mixed-in __str__ 


Here, Sub inherits names from both Super and ListInstance; it’s a composite of its own 
names and names in both its superclasses. When you make a Sub instance and print it, 
you automatically get the custom representation mixed in from ListInstance (in this 
case, this script’s output is the same under both Python 3.0 and 2.6, except for object 
addresses): 
C:\misc> C:\python30\python testmixin. py 
<Instance of Sub, address 40962576: 
name datai=spam 
name data2=eggs 


name data3=42 
> 


ListInstance works in any class it’s mixed into because self refers to an instance of 
the subclass that pulls this class in, whatever that may be. In a sense, mix-in classes are 
the class equivalent of modules—packages of methods useful in a variety of clients. For 
example, here is Lister working again in single-inheritance mode on a different class’s 
instances, with import and attributes set outside the class: 


>>> import lister 
>>> class C(lister.ListInstance): pass 
>>> x = C() 
>>> X.a = 1; x.b = 23 x.c = 3 
>>> print(x) 
<Instance of C, address 40961776: 
name a=1 
name b=2 
name c=3 
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Besides the utility they provide, mix-ins optimize code maintenance, like all classes do. 
For example, if you later decide to extend ListInstance’s _str__ to also print all the 
class attributes that an instance inherits, you’re safe; because it’s an inherited method, 
changing _str__ automatically updates the display of each subclass that imports the 
class and mixes it in. Since it’s now officially “later,” let’s move on to the next section 
to see what such an extension might look like. 


Listing inherited attributes with dir 


As it is, our Lister mix-in displays instance attributes only (i.e., names attached to the 
instance object itself). It’s trivial to extend the class to display all the attributes acces- 
sible from an instance, though—both its own and those it inherits from its classes. The 
trick is to use the dir built-in function instead of scanning the instance’s __dict__ dic- 
tionary; the latter holds instance attributes only, but the former also collects all inheri- 
ted attributes in Python 2.2 and later. 


The following mutation codes this scheme; I’ve renamed it to facilitate simple testing, 
but if this were to replace the original version, all existing clients would pick up the new 
display automatically: 


# File lister.py, continued 


class ListInherited: 

Use dir() to collect both instance attrs and names 
inherited from its classes; Python 3.0 shows more 
names than 2.6 because of the implied object superclass 
in the new-style class model; getattr() fetches inherited 
names not in self. dict__; use _str_, not _repr_, 
or else this loops when printing bound methods! 
def _str_ (self): 

return '<Instance of %s, address %s:\n%s>' % ( 


self.__class__.__name_, # My class's name 
id(self), # My address 
self.__attrnames()) # name=value list 
def _attrnames(self): 
result = '' 
for attr in dir(self): # Instance dir() 
if attr[:2] == '_' and attr[-2:] == '_': # Skip internals 
result += '\tname %s=<>\n' % attr 
else: 


result += '\tname %s=%s\n' % (attr, getattr(self, attr)) 
return result 


Notice that this code skips _X__ names’ values; most of these are internal names that 
we don’t generally care about in a generic listing like this. This version also must use 
the getattr built-in function to fetch attributes by name string instead of using instance 
attribute dictionary indexing—getattr employs the inheritance search protocol, and 
some of the names we’re listing here are not stored on the instance itself. 
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To test the new version, change the testmixin.py file to use this new class instead: 


class Sub(Super, ListInherited): # Mix ina __str__ 


This file’s output varies per release. In Python 2.6, we get the following; notice the name 
mangling at work in the lister’s method name (I shortened its full value display to fit 
on this page): 


C:\misc> c:\python26\python testmixin. py 
<Instance of Sub, address 40073136: 
name _ListInherited__attrnames=<bound method Sub. attrnames of <...more...>> 
name _ doc_ =<> 
name _init_=<> 
name _ module =<> 
name _ str =<> 
name datai=spam 
name data2=eggs 
name data3=42 
name ham=<bound method Sub.ham of <__main__.Sub instance at 0x026377B0>> 
name spam=<bound method Sub.spam of <_main__.Sub instance at 0x026377B0>> 
> 


In Python 3.0, more attributes are displayed because all classes are “new-style” and 
inherit names from the implied object superclass (more on this in Chapter 31). Because 
so many names are inherited from the default superclass, I’ve omitted many here; run 
this on your own for the full listing: 


C:\misc> c:\python30\python testmixin. py 
<Instance of Sub, address 40831792: 
name _ListInherited__attrnames=<bound method Sub. attrnames of <...more...>> 
name _ class =<> 
name _ delattr__=<> 
name _ dict__=<> 
name _ doc_ =<> 
name __eq__=<> 
...more names omitted... 
name _ repr__=<> 
name _ setattr__=<> 
name __sizeof__=<> 
name — str =<> 
name _ subclasshook_ =<> 
name _ weakref__=<> 
name datai=spam 
name data2=eggs 
name data3=42 
name ham=<bound method Sub.ham of <__main_.Sub object at 0x026F0B30>> 
name spam=<bound method Sub.spam of <__main__.Sub object at 0x026F0B30>> 
> 


One caution here—now that we’re displaying inherited methods too, we have to use 
__str__ instead of _repr__ to overload printing. With _repr_, this code will loop— 
displaying the value of a method triggers the _ repr__ of the method’s class, in order 
to display the class. That is, if the lister’s repr ___ tries to display a method, displaying 
the method’s class will trigger the lister’s _ repr again. Subtle, but true! Change 
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__str__ to _repr__ here to see this for yourself. If you must use __repr__ in sucha 
context, you can avoid the loops by using isinstance to compare the type of attribute 
values against types .MethodType in the standard library, to know which items to skip. 


Listing attributes per object in class trees 


Let’s code one last extension. As it is, our lister doesn’t tell us which class an inherited 
name comes from. As we saw in the classtree.py example near the end of Chapter 28, 
though, it’s straightforward to climb class inheritance trees in code. The following mix- 
in class makes use of this same technique to display attributes grouped by the classes 
they live in—it sketches the full class tree, displaying attributes attached to each object 
along the way. It does so by traversing the inheritance tree from an instance’s 
__class__ toits class, and then from the class’s__ bases _ to all superclasses recursively, 
scanning object __dicts__s along the way: 


# File lister.py, continued 


class ListTree: 
Mix-in that returns an _str__ trace of the entire class 
tree and all its objects’ attrs at and above self; 
run by print(), str() returns constructed string; 
uses _ X attr names to avoid impacting clients; 
uses generator expr to recurse to superclasses; 
uses str.format() to make substitutions clearer 
def _str_ (self): 
self. visited = {} 
return '<Instance of {0}, address {1}:\n{2}{3}>'.format( 
self. _class_.__name_, 
id(self), 
self. attrnames(self, 0), 
self. listclass(self._class_, 4)) 


def _listclass(self, aClass, indent): 
dots = '.' * indent 
if aClass in self. visited: 
return '\n{0}<Class {1}:, address {2}: (see above)>\n'.format( 


dots, 
aClass._name_, 
id(aClass) ) 


else: 
self. visited[aClass] = True 
genabove = (self. listclass(c, indent+4) for c in aClass._bases_) 
return '\n{0}<Class {1}, address {2}:\n{3}{4}{5}>\n' . format ( 
dots, 
aClass._name_, 
id(aClass), 
self. _attrnames(aClass, indent), 
'" join(genabove), 
dots) 


def _attrnames(self, obj, indent): 
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spaces = ' ' * (indent + 4) 
result = '' 
for attr in sorted(obj._ dict_): 
if attr.startswith(' _') and attr.endswith('__'): 
result += spaces + '{O}=<>\n'. format (attr) 
else: 
result += spaces + '{O}={1}\n'.format(attr, getattr(obj, attr)) 
return result 


Note the use of a generator expression to direct the recursive calls for superclasses; it’s 
activated by the nested string join method. Also see how this version uses the Python 
3.0 and 2.6 string format method instead of % formatting expressions, to make substi- 
tutions clearer; when many substitutions are applied like this, explicit argument num- 
bers may make the code easier to decipher. In short, in this version we exchange the 
first of the following lines for the second: 


return '<Instance of %s, address %s:\n%s%s>' % (...) # Expression 
return '<Instance of {0}, address {1}:\n{2}{3}>'.format(...) # Method 


Now, change testmixin.py to inherit from this new class again to test: 


class Sub(Super, ListTree): # Mix ina _str__ 


The file’s tree-sketcher output in Python 2.6 is then as follows: 


C:\misc> c:\python26\python testmixin. py 
<Instance of Sub, address 40728496: 
_ListTree__visited={} 
datai=spam 
data2=eggs 
data3=42 


....<Class Sub, address 40701168: 


__doc__=<> 
__init__=<> 
__module__=<> 


spam=<unbound method Sub.spam> 


areen <Class Super, address 40701120: 
__doc__=<> 
__init__=<> 
__module__=<> 
ham=<unbound method Super.ham> 


TEETE ETA <Class ListTree, address 40700688: 
_ListTree__attrnames=<unbound method ListTree. _attrnames> 
_ListTree__listclass=<unbound method ListTree. _listclass> 


__doc_=<> 
__module_=<> 
_str_=<> 
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Notice in this output how methods are unbound now under 2.6, because we fetch them 
from classes directly, instead of from instances. Also observe how the lister’s 
__visited table has its name mangled in the instance’s attribute dictionary; unless we’re 
very unlucky, this won’t clash with other data there. 


Under Python 3.0, we get extra attributes and superclasses again. Notice that unbound 
methods are simple functions in 3.0, as described in an earlier note in this chapter (and 
that again, I’ve deleted most built-in attributes in object to save space here; run this on 
your own for the complete listing): 
C:\misc> c:\python30\python testmixin. py 
<Instance of Sub, address 40635216: 
_ListTree__visited={} 
datai=spam 
data2=eggs 
data3=42 


..<Class Sub, address 40914752: 


__doc__=<> 
__init_=<> 
__module__=<> 


spam=<function spam at 0x026D53D8> 


singe eee <Class Super, address 40829952: 


__dict__=<> 
__doc__=<> 
__init_=<> 


__module__=<> 
__weakref__=<> 
ham=<function ham at 0x026D5228> 


Ae EE Mere ees <Class object, address 505114624: 
__class__=<> 
_ delattr_=<> 
__doc__=<> 
_eq__=<> 
...more omitted... 
__repr__=<> 
__setattr__=<> 
__ sizeof =<> 
_ str =<> 
__subclasshook__=<> 


Sahay ware 8 <Class ListTree, address 40829496: 
_ListTree__attrnames=<function _attrnames at 0x026D5660> 
_ListTree_listclass=<function _ listclass at 0x026D56A8> 
__dict__=<> 
__doc__=<> 
__module__=<> 
_ str =<> 
__weakref__=<> 
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HORA ere <Class object:, address 505114624: (see above)> 


This version avoids listing the same class object twice by keeping a table of classes 
visited so far (this is why an object’s id is included—to serve as a key for a previously 
displayed item). Like the transitive module reloader of Chapter 24, a dictionary works 
to avoid repeats and cycles here because class objects may be dictionary keys; a set 
would provide similar functionality. 


This version also takes care to avoid large internal objects by skipping _X__ names 
again. If you comment out the test for these names, their values will display normally. 
Here’s an excerpt from the output in 2.6 with this temporary change made (it’s much 
larger in its entirety, and it gets even worse in 3.0, which is why these names are probably 
better skipped!): 


C:\misc> c:\python26\python testmixin.py 
...more omitted... 


siecai si nies <Class ListTree, address 40700688: 
_ListTree__attrnames=<unbound method ListTree.__attrnames> 
_ListTree__listclass=<unbound method ListTree._listclass> 

doc_ = 
Mix-in that returns the _str_ trace of the entire class 
tree and all its objects’ attrs at and above self; 
run by print, str returns constructed string; 
uses _ X attr names to avoid impacting clients; 
uses generator expr to recurse to superclasses; 
uses str.format() to make substitutions clearer 


__ module __=lister 
__str__=<unbound method ListTree.__str_> 


For more fun, try mixing this class into something more substantial, like the Button 
class of Python’s tkinter GUI toolkit module. In general, you’ll want to name List 
Tree first (leftmost) in a class header, so its __str__ is picked up; Button has one, too, 
and the leftmost superclass is searched first in multiple inheritance. The output of 
the following is fairly massive (18K characters), so run this code on your own to see 
the full listing (and if you’re using Python 2.6, recall that you should use Tkinter for 
the module name instead of tkinter): 
>>> from lister import ListTree 


>>> from tkinter import Button # Both classes have a __str__ 
>>> class MyButton(ListTree, Button): pass # ListTree first: use its __str 


>>> B = MyButton(text=' spam’ ) 


>>> open('savetree.txt', 'w').write(str(B)) # Save to a file for later viewing 
18247 
>>> print(B) # Print the display here 


<Instance of MyButton, address 44355632: 
_ListTree_visited={} 
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_name=44355632 

_tclCommands=[ ] 

...much more omitted... 
> 


Of course, there’s much more we could do here (sketching the tree in a GUI might be 
a natural next step), but we’ll leave further work as a suggested exercise. We’ll also 
extend this code in the exercises at the end of this part of the book, to list superclass 
names in parentheses at the start of instance and class displays. 


The main point here is that OOP is all about code reuse, and mix-in classes are a 
powerful example. Like almost everything else in programming, multiple inheritance 
can be a useful device when applied well. In practice, though, it is an advanced feature 
and can become complicated if used carelessly or excessively. We’ll revisit this topic as 
a gotcha at the end of the next chapter. In that chapter, we’ll also meet the new-style 
class model, which modifies the search order for one special multiple inheritance case. 


Supporting slots: Because they scan instance dictionaries, the 
ListInstance and ListTree classes presented here don’t directly support 
attributes stored in slots—a newer and relatively rarely used option we'll 
meet in the next chapter, where instance attributes are declared in a 
__slots_ class attribute. For example, if in textmixin.py we assign 
__slots_=['data1'] in Super and_ slots =['data3'] in Sub, only the 
data2 attribute is displayed in the instance by these two lister classes; 
ListTree also displays data1 and data3, but as attributes of the Super 
and Sub class objects and with a special format for their values (techni- 
cally, they are class-level descriptors). 


To better support slot attributes in these classes, change the __dict__ 
scanning loops to also iterate through _ slots __ lists using code the next 
chapter will present, and use the getattr built-in function to fetch values 
instead of _ dict__ indexing (ListTree already does). Since instances 
inherit only the lowest class’s__slots__, you may also need to come up 
with a policy when _ slots lists appear in multiple superclasses 
(ListTree already displays them as class attributes). ListInherited is 
immune to all this, because dir results combine both __ dict__ names 
and all classes’ __slots__ names. 


Alternatively, as a policy we could simply let our code handle slot-based 
attributes as it currently does, rather than complicating it for a rare, 
advanced feature. Slots and normal instance attributes are different 
kinds of names. We’ll investigate slots further in the next chapter; I 
omitted addressing them in these examples to avoid a forward 
dependency (not counting this note, of course!)—not exactly a valid 
design goal, but reasonable for a book. 
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Classes Are Objects: Generic Object Factories 


Sometimes, class-based designs require objects to be created in response to conditions 
that can’t be predicted when a program is written. The factory design pattern allows 
such a deferred approach. Due in large part to Python’s flexibility, factories can take 
multiple forms, some of which don’t seem special at all. 


Because classes are objects, it’s easy to pass them around a program, store them in data 
structures, and so on. You can also pass classes to functions that generate arbitrary 
kinds of objects; such functions are sometimes called factories in OOP design circles. 
Factories are a major undertaking in a strongly typed language such as C++ but are 
almost trivial to implement in Python. The call syntax we met in Chapter 18 can call 
any class with any number of constructor arguments in one step to generate any sort 
of instance:* 


def factory(aClass, *args): # Varargs tuple 
return aClass(*args) # Call aClass (or apply in 2.6 only) 
class Spam: 
def doit(self, message): 
print (message) 


class Person: 
def _ init__(self, name, job): 
self.name = name 
self.job = job 


object1 = factory(Spam) # Make a Spam object 
object2 = factory(Person, "Guido", "guru") | # Make a Person object 


In this code, we define an object generator function called factory. It expects to be 
passed a class object (any class will do) along with one or more arguments for the class’s 
constructor. The function uses special “varargs” call syntax to call the function and 
return an instance. 


The rest of the example simply defines two classes and generates instances of both by 
passing them to the factory function. And that’s the only factory function you’ll ever 
need to write in Python; it works for any class and any constructor arguments. 


One possible improvement worth noting is that to support keyword arguments in con- 
structor calls, the factory can collect them with a **args argument and pass them along 
in the class call, too: 


def factory(aClass, *args, **kwargs): # +kwargs dict 
return aClass(*args, **kwargs) # Call aClass 


+ Actually, this syntax can invoke any callable object, including functions, classes, and methods. Hence, the 
factory function here can also run any callable object, not just a class (despite the argument name). Also, as 
we learned in Chapter 18, Python 2.6 has an alternative to aClass(*args): the apply(aClass, args) built-in 
call, which has been removed in Python 3.0 because of its redundancy and limitations. 
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By now, you should know that everything is an “object” in Python, including things 
like classes, which are just compiler input in languages like C++. However, as men- 
tioned at the start of this part of the book, only objects derived from classes are OOP 
objects in Python. 


Why Factories? 


So what good is the factory function (besides providing an excuse to illustrate class 
objects in this book)? Unfortunately, it’s difficult to show applications of this design 
pattern without listing much more code than we have space for here. In general, though, 
such a factory might allow code to be insulated from the details of dynamically con- 
figured object construction. 


For instance, recall the processor example presented in the abstract in Chapter 25, and 
then again as a composition example in this chapter. It accepts reader and writer objects 
for processing arbitrary data streams. The original version of this example manually 
passed in instances of specialized classes like FileWriter and SocketReader to customize 
the data streams being processed; later, we passed in hardcoded file, stream, and 
formatter objects. In a more dynamic scenario, external devices such as configuration 
files or GUIs might be used to configure the streams. 


In such a dynamic world, we might not be able to hardcode the creation of stream 
interface objects in our scripts, but might instead create them at runtime according to 
the contents of a configuration file. 


For example, the file might simply give the string name of a stream class to be imported 
from a module, plus an optional constructor call argument. Factory-style functions or 
code might come in handy here because they would allow us to fetch and pass in classes 
that are not hardcoded in our program ahead of time. Indeed, those classes might not 
even have existed at all when we wrote our code: 


classname = ...parse from config file... 

classarg = ...parse from config file... 

import streamtypes # Customizable code 
aclass = getattr(streamtypes, classname) # Fetch from module 
reader = factory(aclass, classarg) # Or aclass(classarg) 
processor(reader, ...) 


Here, the getattr built-in is again used to fetch a module attribute given a string name 
(it’s like saying obj.attr, but attr is a string). Because this code snippet assumes a 
single constructor argument, it doesn’t strictly need factory or apply—we could make 
an instance with just aclass(classarg). They may prove more useful in the presence 
of unknown argument lists, however, and the general factory coding pattern can im- 
prove the code’s flexibility. 
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Other Design-Related Topics 


In this chapter, we’ve seen inheritance, composition, delegation, multiple inheritance, 
bound methods, and factories—all common patterns used to combine classes in Python 
programs. We’ve really only scratched the surface here in the design patterns domain, 
though. Elsewhere in this book you'll find coverage of other design-related topics, such 
as: 


e Abstract superclasses (Chapter 28) 

e Decorators (Chapters 31 and 38) 

e Type subclasses (Chapter 31) 

e Static and class methods (Chapter 31) 
e Managed attributes (Chapter 37) 

e Metaclasses (Chapters 31 and 39) 


For more details on design patterns, though, we’ll delegate to other resources on OOP 
at large. Although patterns are important in OOP work, and are often more natural in 
Python than other languages, they are not specific to Python itself. 


Chapter Summary 


In this chapter, we sampled common ways to use and combine classes to optimize their 
reusability and factoring benefits—what are usually considered design issues that are 
often independent of any particular programming language (though Python can make 
them easier to implement). We studied delegation (wrapping objects in proxy classes), 
composition (controlling embedded objects), and inheritance (acquiring behavior from 
other classes), as well as some more esoteric concepts such as pseudoprivate attributes, 
multiple inheritance, bound methods, and factories. 


The next chapter ends our look at classes and OOP by surveying more advanced class- 
related topics; some of its material may be of more interest to tool writers than appli- 
cation programmers, but it still merits a review by most people who will do OOP in 
Python. First, though, another quick chapter quiz. 


Test Your Knowledge: Quiz 


1. What is multiple inheritance? 
2. What is delegation? 


3. What is composition? 
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4. What are bound methods? 


3: 


What are pseudoprivate attributes used for? 


Test Your Knowledge: Answers 


1. 


Multiple inheritance occurs when a class inherits from more than one superclass; 
it’s useful for mixing together multiple packages of class-based code. The left-to- 
right order in class statement headers determines the order of attribute searches. 


. Delegation involves wrapping an object in a proxy class, which adds extra behavior 


and passes other operations to the wrapped object. The proxy retains the interface 
of the wrapped object. 


. Composition is a technique whereby a controller class embeds and directs a num- 


ber of objects, and provides an interface all its own; it’s a way to build up larger 
structures with classes. 


. Bound methods combine an instance and a method function; you can call them 


without passing in an instance object explicitly because the original instance is still 
available. 


. Pseudoprivate attributes (whose names begin with two leading underscores: __X) 


are used to localize names to the enclosing class. This includes both class attributes 
like methods defined inside the class, and self instance attributes assigned inside 
the class. Such names are expanded to include the class name, which makes them 
unique. 
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CHAPTER 31 
Advanced Class Topics 


This chapter concludes our look at OOP in Python by presenting a few more advanced 
class-related topics: we will survey subclassing built-in types, “new-style” class changes 
and extensions, static and class methods, function decorators, and more. 


As we’ve seen, Python’s OOP model is, at its core, very simple, and some of the topics 
presented in this chapter are so advanced and optional that you may not encounter 
them very often in your Python applications-programming career. In the interest of 
completeness, though, we’ll round out our discussion of classes with a brief look at 
these advanced tools for OOP work. 


As usual, because this is the last chapter in this part of the book, it ends with a section 
on class-related “gotchas,” and the set of lab exercises for this part. I encourage you to 
work through the exercises to help cement the ideas we’ve studied here. I also suggest 
working on or studying larger OOP Python projects as a supplement to this book. As 
with much in computing, the benefits of OOP tend to become more apparent with 
practice. 


Vas 

i 
SS Content note: This chapter collects advanced class topics, but some are 
43 even too advanced for this chapter to cover well. Topics such as prop- 
~~ a erties, descriptors, decorators, and metaclasses are only briefly men- 


` tioned here, and are covered more fully in the final part of this book. Be 
sure to look ahead for more complete examples and extended coverage 
of some of the subjects that fall into this chapter’s category. 


Extending Built-in Types 


Besides implementing new kinds of objects, classes are sometimes used to extend the 
functionality of Python’s built-in types to support more exotic data structures. For 
instance, to add queue insert and delete methods to lists, you can code classes that wrap 
(embed) a list object and export insert and delete methods that process the list specially, 
like the delegation technique we studied in Chapter 30. As of Python 2.2, you can also 
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use inheritance to specialize built-in types. The next two sections show both techniques 
in action. 


Extending Types by Embedding 


Remember those set functions we wrote in Chapters 16 and 18? Here’s what they look 
like brought back to life as a Python class. The following example (the file 
setwrapper.py) implements a new set object type by moving some of the set functions 
to methods and adding some basic operator overloading. For the most part, this class 
just wrapsa Python list with extra set operations. But because it’s a class, it also supports 
multiple instances and customization by inheritance in subclasses. Unlike our earlier 
functions, using classes here allows us to make multiple self-contained set objects with 
preset data and behavior, rather than passing lists into functions manually: 
class Set: 
def init__(self, value 


self.data = [] 
self.concat (value) 


# Constructor 
# Manages a list 


[]): 


def intersect(self, other): 
res = [] 
for x in self.data: 


# other is any sequence 
# self is the subject 


if x in other: 
res .append(x) 
return Set(res) 
def union(self, other): 
res = self.data[:] 
for x in other: 
if not x in res: 
res .append(x) 
return Set(res) 
def concat(self, value): 
for x in value: 
if not x in self.data: 
self.data.append(x) 


def _len_ (self): 

def _getitem_ (self, key): 
def _and_ (self, other): 
def _or_ (self, other): 
def _repr_ (self): 


return 
return 
return 
return 
return 


x = Set([1, 3, 5, 7]) 
print(x.union(Set([1, 4, 7]))) 
print(x | Set([1, 4, 6])) 


# Pick common items 
# Return a new Set 
# other is any sequence 


# Copy of my list 
# Add items in other 


# value: list, Set... 
# Removes duplicates 


len(self.data) # len(self) 
self.data[key] # self[i] 
self.intersect (other) # self & other 


self.union(other) # self | other 
"Set:' + repr(self.data) # print() 


To use this class, we make instances, call methods, and run defined operators as usual: 


# prints Set:[1, 3, 5, 7, 
# prints Set:[1, 3, 5, 7, 


4] 
4, 6] 
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Overloading operations such as indexing enables instances of our Set class to mas- 
querade as real lists. Because you will interact with and extend this class in an exercise 
at the end of this chapter, I won’t say much more about this code until Appendix B. 


Extending Types by Subclassing 


Beginning with Python 2.2, all the built-in types in the language can now be subclassed 
directly. Type-conversion functions such as list, str, dict, and tuple have become 
built-in type names—although transparent to your script, a type-conversion call (e.g., 
list(‘spam' )) is now really an invocation of a type’s object constructor. 


This change allows you to customize or extend the behavior of built-in types with user- 
defined class statements: simply subclass the new type names to customize them. In- 
stances of your type subclasses can be used anywhere that the original built-in type can 
appear. For example, suppose you have trouble getting used to the fact that Python list 
offsets begin at 0 instead of 1. Not to worry—you can always code your own subclass 
that customizes this core behavior of lists. The file typesubclass.py shows how: 


# Subclass built-in list type/class 
# Map 1..N to 0..N-1; call back to built-in version. 


class MyList(list): 
def _ getitem_(self, offset): 
print(' (indexing %s at %s)' % (self, offset)) 
return list. getitem_(self, offset - 1) 


if _name_ == '_main_': 
print(list(‘abc')) 
x = MyList('abc') # __init__ inherited from list 
print(x) # __repr__ inherited from list 
print(x[1]) # MyList.__getitem__ 
print(x[3]) # Customizes list superclass method 
x.append('spam'); print(x) # Attributes from list superclass 
x.reverse(); print(x) 


In this file, the MyList subclass extends the built-in list’s __ getitem__ indexing method 
only to map indexes 1 to N back to the required 0 to N-1. All it really does is decrement 
the submitted index and call back to the superclass’s version of indexing, but it’s 
enough to do the trick: 


% python typesubclass.py 

[a 'b', ‘c'] 

[a 'b', 'c'] 

(indexing ['a', 'b', 'c'] at 1) 
a 

(indexing ['a', 'b', 'c'] at 3) 
c 

[0an 'b', er ‘spam’ ] 
['spam', terg 'b', ‘a'] 
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This output also includes tracing text the class prints on indexing. Of course, whether 
changing indexing this way is a good idea in general is another issue—users of your 
MyList class may very well be confused by such a core departure from Python sequence 
behavior. The ability to customize built-in types this way can be a powerful asset, 
though. 


For instance, this coding pattern gives rise to an alternative way to code a set—as a 
subclass of the built-in list type, rather than a standalone class that manages an em- 
bedded list object, as shown earlier in this section. As we learned in Chapter 5, Python 
today comes with a powerful built-in set object, along with literal and comprehension 
syntax for making new sets. Coding one yourself, though, is still a great way to learn 
about type subclassing in general. 


The following class, coded in the file setsubclass.py, customizes lists to add just methods 
and operators related to set processing. Because all other behavior is inherited from the 
built-in list superclass, this makes for a shorter and simpler alternative: 


class Set(list): 


def _ init__(self, value = []): # Constructor 


list. init_([]) 
self.concat (value) 


# Customizes list 
# Copies mutable defaults 


def intersect(self, other): # other is any sequence 
res = [] # self is the subject 
for x in self: 
if x in other: # Pick common items 
res.append(x) 
return Set(res) # Return a new Set 
def union(self, other): # other is any sequence 
res = Set(self) # Copy me and my list 
res.concat (other) 
return res 
def concat(self, value): # value: list, Set... 


for x in value: 
if not x in self: 
self.append(x) 


# Removes duplicates 


def _and_ (self, other): return self.intersect (other) 


def _or_(self, other): return self.union(other) 

def _repr_ (self): return 'Set:' + list. repr (self) 
if _name_ == '_main_': 

x= Set([1,3,5,7]) 

ye Set ([2,1,4,5,6]) 


print(x, y, len(x)) 


print(x.intersect(y), y.union(x)) 


print(x & y, x | y) 
x.reverse(); print(x) 
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Here is the output of the self-test code at the end of this file. Because subclassing core 
types is an advanced feature, I'll omit further details here, but I invite you to trace 
through these results in the code to study its behavior: 

% python setsubclass.py 

Set:[1, 3, 5, 7] Set:[2, 1, 4, 5, 6] 4 

Set:[1, 5] Set:[2, 1, 4, 5, 6, 3, 7] 

Set:[1, 5] Set:[1, 3, 5, 7, 2, 4, 6] 

Set:[7, 5, 3, 1] 
There are more efficient ways to implement sets with dictionaries in Python, which 
replace the linear scans in the set implementations shown here with dictionary index 
operations (hashing) and so run much quicker. (For more details, see Programming 
Python.) If you’re interested in sets, also take another look at the set object type we 
explored in Chapter 5; this type provides extensive set operations as built-in tools. Set 
implementations are fun to experiment with, but they are no longer strictly required in 
Python today. 


For another type subclassing example, see the implementation of the bool type in Py- 
thon 2.3 and later. As mentioned earlier in the book, bool is a subclass of int with two 
instances (True and False) that behave like the integers 1 and 0 but inherit custom string- 
representation methods that display their names. 


The “New-Style” Class Model 


In Release 2.2, Python introduced a new flavor of classes, known as “new-style” classes; 
classes following the original model became known as “classic classes” when compared 
to the new kind. In 3.0 the class story has merged, but it remains split for Python 2.X 
users: 


e As of Python 3.0, all classes are automatically what we used to call “new-style,” 
whether they explicitly inherit from object or not. All classes inherit from object, 
whether implicitly or explicitly, and all objects are instances of object. 


e In Python 2.6 and earlier, classes must inherit from object (or another built-in type) 
to be considered “new-style” and obtain all new-style features. 


Because all classes are automatically new-style in 3.0, the features of new-style classes 
are simply normal class features. I’ve opted to keep their descriptions in this section 
separate, however, in deference to users of Python 2.X code—classes in such code 
acquire new-style features only when they are derived from object. 


In other words, when Python 3.0 users see descriptions of “new-style” features in this 
section, they should take them to be descriptions of existing features of their classes. 
For 2.6 readers, these are a set of optional extensions. 


In Python 2.6 and earlier, the only syntactic difference for new-style classes is that they 
are derived from either a built-in type, such as list, or a special built-in class known 
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as object. The built-in name object is provided to serve as a superclass for new-style 
classes if no other built-in type is appropriate to use: 


class newstyle(object): 
...normal code... 


Any class derived from object, or any other built-in type, is automatically treated as a 
new-style class. As long as a built-in type is somewhere in the superclass tree, the new 
class is treated as a new-style class. Classes not derived from built-ins such as object 
are considered classic. 


New-style classes are only slightly different from classic classes, and the ways in which 
they differ are irrelevant to the vast majority of Python users. Moreover, the classic class 
model still available in 2.6 works exactly as it has for almost two decades. 


In fact, new-style classes are almost completely backward compatible with classic 
classes in syntax and behavior; they mostly just add a few advanced new features. 
However, because they modify a handful of class behaviors, they had to be introduced 
as a distinct tool so as to avoid impacting any existing code that depends on the prior 
behaviors. For example, some subtle differences, such as diamond pattern inheritance 
search and the behavior of built-in operations with managed attribute methods such 
as__getattr__, can cause some legacy code to fail if left unchanged. 


The next two sections provide overviews of the ways the new-style classes differ and 
the new tools they provide. Again, because all classes are new-style today, these topics 
represent changes to Python 2.X readers but simply additional advanced class topics 
to Python 3.0 readers. 


New-Style Class Changes 


New-style classes differ from classic classes in a number of ways, some of which are 
subtle but can impact existing 2.X code and coding styles. Here are some of the most 
prominent ways they differ: 


Classes and types merged 
Classes are now types, and types are now classes. In fact, the two are essentially 
synonyms. The type(Z) built-in returns the class an instance is made from, instead 
of a generic instance type, and is normally the same as I.__class__. Moreover, 
classes are instances of the type class, type may be subclassed to customize class 
creation, and all classes (and hence types) inherit from object. 


Inheritance search order 
Diamond patterns of multiple inheritance have a slightly different search order— 
roughly, they are searched across before up, and more breadth-first than depth- 
first. 
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Attribute fetch for built-ins 
The _ getattr__ and _ getattribute_ methods are no longer run for attributes 
implicitly fetched by built-in operations. This means that they are not called for 
__X_ operator overloading method names—the search for such names begins at 
classes, not instances. 


New advanced tools 
New-style classes have a set of new class tools, including slots, properties, descrip- 
tors, and the _ getattribute__ method. Most of these have very specific tool- 
building purposes. 


We discussed the third of these changes briefly in a sidebar in Chapter 27, and we’ll 
revisit it in depth in the contexts of attribute management in Chapter 37 and privacy 
decorators in Chapter 38. Because the first and second of the changes just listed can 
break existing 2.X code, though, let’s explore these in more detail before moving on to 
new-style additions. 


Type Model Changes 


In new-style classes, the distinction between type and class has vanished entirely. 
Classes themselves are types: the type object generates classes as its instances, and 
classes generate instances of their type. If fact, there is no real difference between built- 
in types like lists and strings and user-defined types coded as classes. This is why we 
can subclass built-in types, as shown earlier in this chapter—because subclassing a 
built-in type such as list qualifies a class as new-style, it becomes a user-defined type. 


Besides allowing us to subclass built-in types, one of the contexts where this becomes 
most obvious is when we do explicit type testing. With Python 2.6’s classic classes, the 
type of a class instance is a generic “instance,” but the types of built-in objects are more 
specific: 


C:\misc> c:\python26\python 


>>> class C: pass # Classic classes in 2.6 
>>> I= c() 
>>> type(T) # Instances are made from classes 


<type 'instance'> 
>>> I._class__ 
<class _ main_.C at 0x025085A0> 


>>> type(C) # But classes are not the same as types 
<type 'classobj'> 

>>> C._ class __ 

AttributeError: class C has no attribute 


_ class _' 


>>> type([1, 2, 3]) 
<type 'list'> 
>>> type(list) 
<type 'type'> 
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>>> list. class _ 

<type 'type'> 
But with new-style classes in 2.6, the type of a class instance is the class it’s created 
from, since classes are simply user-defined types—the type of an instance is its class, 
and the type of a user-defined class is the same as the type of a built-in object type. 
Classes have a __class__ attribute now, too, because they are instances of type: 


C:\misc> c:\python26\python 


>>> class C(object): pass # New-style classes in 2.6 

>>> I= c() 

>>> type(I) # Type of instance is class it's made from 
<class '_ main_.C'> 


>>> I.__class__ 
<class '_ main_.C'> 


>>> type(C) # Classes are user-defined types 
<type 'type'> 

>>> C._ class __ 

<type 'type'> 


>>> type([1, 2, 3]) # Built-in types work the same way 
<type 'list'> 

>>> type(list) 

<type 'type'> 

>>> list. class _ 

<type 'type'> 


The same is true for all classes in Python 3.0, since all classes are automatically new- 
style, even if they have no explicit superclasses. In fact, the distinction between built- 
in types and user-defined class types melts away altogether in 3.0: 


C:\misc> c:\python30\python 


>>> class C: pass # All classes are new-style in 3.0 

>>> I= c() 

>>> type(I) # Type of instance is class it's made from 
<class '_ main_.C'> 


>>> I.__class__ 
<class '_ main_.C'> 


>>> type(C) # Class is a type, and type is a class 
<class 'type'> 

>>> C._ class__ 

<class 'type'> 


>>> type([1, 2, 3]) # Classes and built-in types work the same 
<class 'list'> 

>>> type(list) 

<class 'type'> 

>>> list. class _ 

<class 'type'> 
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As you can see, in 3.0 classes are types, but types are also classes. Technically, each 
class is generated by a metaclass—a class that is normally either type itself, or a subclass 
of it customized to augment or manage generated classes. Besides impacting code that 
does type testing, this turns out to be an important hook for tool developers. We’ll talk 
more about metaclasses later in this chapter, and again in more detail in Chapter 39. 


Implications for type testing 


Besides providing for built-in type customization and metaclass hooks, the merging of 
classes and types in the new-style class model can impact code that does type testing. 
In Python 3.0, for example, the types of class instances compare directly and mean- 
ingfully, and in the same way as built-in type objects. This follows from the fact that 
classes are now types, and an instance’s type is the instance’s class: 


C:\misc> c:\python30\python 
>>> class C: pass 


>>> class D: pass 


>>> c= C() 

>>> d = D() 

>>> type(c) == type(d) # 3.0: compares the instances' classes 
False 


>>> type(c), type(d) 

(<class '_ main_.C'>, <class '_main_.D'>) 
>>> c. class_, d.__class__ 

(<class '_ main _.C'>, <class '_main_.D'>) 


>>> c1, c2 = C(), C() 
>>> type(c1) == type(c2) 
True 


With classic classes in 2.6 and earlier, though, comparing instance types is almost use- 
less, because all instances have the same “instance” type. To truly compare types, the 
instance _class__ attributes must be compared (if you care about portability, this 
works in 3.0, too, but it’s not required there): 


C:\misc> c:\python26\python 
>>> class C: pass 


>>> class D: pass 


>>> c= C() 

>>> d = D() 

>>> type(c) == type(d) # 2.6: all instances are same type 
True 

>>> c.__class_ == d.__class__ # Must compare classes explicitly 
False 


>>> type(c), type(d) 
(<type ‘instance'>, <type 'instance'>) 
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>>> c._ class_, d.__class__ 
(<class _ main_.C at 0x024585A0>, <class _ main__.D at 0x024588D0>) 


And as you should expect by now, new-style classes in 2.6 work the same as all classes 
in 3.0 in this regard—comparing instance types compares the instances’ classes 
automatically: 


C:\misc> c:\python26\python 
>>> class C(object): pass 


>>> class D(object): pass 


>>> ¢ 


= C() 
>>> d = D() 
>>> type(c) == type(d) # 2.6 new-style: same as all in 3.0 
False 


>>> type(c), type(d) 

(<class '_main_.C'>, <class 
>>> c. class_, d.__class__ 
(<class '_ main _.C'>, <class '_main_.D'>) 


__main_.D'>) 


Of course, as I’ve pointed out numerous times in this book, type checking is usually 
the wrong thing to do in Python programs (we code to object interfaces, not object 
types), and the more general isinstance built-in is more likely what you’ll want to use 
in the rare cases where instance class types must be queried. However, knowledge of 
Python’s type model can help demystify the class model in general. 


All objects derive from “object” 


One other ramification of the type change in the new-style class model is that because 
all classes derive (inherit) from the class object either implicitly or explicitly, and be- 
cause all types are now classes, every object derives from the object built-in class, 
whether directly or through a superclass. Consider the following interaction in Python 
3.0 (code an explicit object superclass in 2.6 to make this work equivalently): 


>>> class C: pass 

>>> X = C() 

>>> type(X) # Type is now class instance was created from 

<class '_ main_.C'> 

>>> type(C) 

<class 'type'> 
As before, the type of a class instance is the class it was made from, and the type of a 
class is the type class because classes and types have merged. It is also true, though, 


that the instance and class are both derived from the built-in object class, since this is 
an implicit or explicit superclass of every class: 
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>>> isinstance(X, object) 

True 

>>> isinstance(C, object) # Classes always inherit from object 
True 


The same holds true for built-in types like lists and strings, because types are classes in 
the new-style model—built-in types are now classes, and their instances derive from 
object, too: 

>>> type('spam') 

<class 'str'> 


>>> type(str) 
<class 'type'> 


>>> isinstance('spam', object) # Same for built-in types (classes) 
True 

>>> isinstance(str, object) 

True 


In fact, type itself derives from object, and object derives from type, even though the 
two are different objects—a circular relationship that caps the object model and stems 
from the fact that types are classes that generate classes: 

>>> type(type) # All classes are types, and vice versa 

<class 'type'> 

>>> type(object) 

<class 'type'> 


>>> isinstance(type, object) # All classes derive from object, even type 
True 

>>> isinstance(object, type) # Types make classes, and type is a class 
True 

>>> type is object 

False 


In practical terms, this model makes for fewer special cases than the prior type/class 
distinction of classic classes, and it allows us to write code that assumes and uses an 
object superclass. We’ll see examples of the latter later in the book; for now, let’s move 
on to explore other new-style changes. 


Diamond Inheritance Change 


One of the most visible changes in new-style classes is their slightly different inheritance 
search procedures for the so-called diamond pattern of multiple inheritance trees, where 
more than one superclass leads to the same higher superclass further above. The dia- 
mond pattern is an advanced design concept, is coded only rarely in Python practice, 
and has not been discussed in this book, so we won’t dwell on this topic in depth. 


In short, though, with classic classes, the inheritance search procedure is strictly depth 
first, and then left to right—Python climbs all the way to the top, hugging the left side 
of the tree, before it backs up and begins to look further to the right. In new-style classes, 
the search is more breadth-first in such cases—Python first looks in any superclasses 
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to the right of the first one searched before ascending all the way to the common 
superclass at the top. In other words, the search proceeds across by levels before moving 
up. The search algorithm is a bit more complex than this, but this is as much as most 
programmers need to know. 


Because of this change, lower superclasses can overload attributes of higher super- 
classes, regardless of the sort of multiple inheritance trees they are mixed into. More- 
over, the new-style search rule avoids visiting the same superclass more than once when 
it is accessible from multiple subclasses. 


Diamond inheritance example 


To illustrate, consider this simplistic incarnation of the diamond multiple inheritance 
pattern for classic classes. Here, D’s superclasses B and C both lead to the same common 
ancestor, A: 


>>> class A: 


attr = 1 # Classic (Python 2.6) 
>>> class B(A): # B and C both lead to A 
pass 


>>> class C(A): 
attr = 2 


>>> class D(B, C): 


pass # Tries A before C 
>>> x = DY) 
>>> x.attr # Searches x, D, B, A 


1 


The attribute here is found in superclass A, because with classic classes, the inheritance 
search climbs as high as it can before backing up and moving right—Python will search 
D, B, A, and then C, but will stop when attr is found in A, above B. 


However, with new-style classes derived from a built-in like object, and all classes in 
3.0, the search order is different: Python looks in C (to the right of B) before A (above 
B). That is, it searches D, B, C, and then A, and in this case, stops in C: 


>>> class A(object): 
attr =1 # New-style ("object" not required in 3.0) 


>>> class B(A): 
pass 


>>> class C(A): 
attr = 2 


>>> class D(B, C): 
pass # Tries C before A 


>>> x = DY) 
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>>> x.attr # Searches x, D, B, C 
2 


This change in the inheritance search procedure is based upon the assumption that if 
you mix in C lower in the tree, you probably intend to grab its attributes in preference 
to A’s. It also assumes that C is always intended to override A’s attributes in all contexts, 
which is probably true when it’s used standalone but may not be when it’s mixed into 
a diamond with classic classes—you might not even know that C may be mixed in like 
this when you code it. 


Since it is most likely that the programmer meant that C should override A in this case, 
though, new-style classes visit C first. Otherwise, C could be essentially pointless in a 
diamond context: it could not customize A and would be used only for names unique 
to C. 


Explicit conflict resolution 


Of course, the problem with assumptions is that they assume things. If this search order 
deviation seems too subtle to remember, or if you want more control over the search 
process, you can always force the selection of an attribute from anywhere in the tree 
by assigning or otherwise naming the one you want at the place where the classes are 
mixed together: 


>>> class A: 
attr =1 # Classic 


>>> class B(A): 
pass 


>>> class C(A): 
attr = 2 


>>> class D(B, C): 
attr = C.attr # Choose C, to the right 


>>> x = D() 
>>> x.attr # Works like new-style (all 3.0) 


Here, a tree of classic classes is emulating the search order of new-style classes: the 
assignment to the attribute in D picks the version in C, thereby subverting the normal 
inheritance search path (D.attr will be lowest in the tree). New-style classes can simi- 
larly emulate classic classes by choosing the attribute above at the place where the 
classes are mixed together: 


>>> class A(object): 
attr = 1 # New-style 


>>> class B(A): 
pass 


>>> class C(A): 
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attr = 2 


>>> class D(B, C): 
attr = B.attr # Choose A.attr, above 


>>> x = DY) 
>>> x.attr # Works like classic (default 2.6) 
1 


If you are willing to always resolve conflicts like this, you can largely ignore the search 
order difference and not rely on assumptions about what you meant when you coded 
your classes. 


Naturally, attributes picked this way can also be method functions—methods are nor- 
mal, assignable objects: 
>>> class A: 
def meth(s): print('A.meth') 


>>> class C(A): 
def meth(s): print('C.meth') 


>>> class B(A): 


pass 
>>> class D(B, C): pass # Use default search order 
>>> x = D() # Will vary per class type 
>>> x.meth() # Defaults to classic order in 2.6 


A.meth 


>>> class D(B, C): meth = C.meth # Pick C's method: new-style (and 3.0) 


>>> x = DY) 
>>> x.meth() 
C.meth 


>>> class D(B, C): meth = B.meth # Pick B's method: classic 


>>> x = DY) 
>>> x.meth() 
A.meth 


Here, we select methods by explicitly assigning to names lower in the tree. We might 
also simply call the desired class explicitly; in practice, this pattern might be more 
common, especially for things like constructors: 


class D(B, C): 
def meth(self): # Redefine lower 


C.meth(self) # Pick C's method by calling 


Such selections by assignment or call at mix-in points can effectively insulate your code 
from this difference in class flavors. Explicitly resolving the conflicts this way ensures 
that your code won’t vary per Python version in the future (apart from perhaps needing 
to derive classes from object or a built-in type for the new-style tools in 2.6). 
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Even without the classic/new-style class divergence, the explicit method 
resolution technique shown here may come in handy in multiple inher- 
itance scenarios in general. For instance, if you want part of a superclass 
on the left and part of a superclass on the right, you might need to tell 
Python which same-named attributes to choose by using explicit as- 
signments in subclasses. We’ll revisit this notion in a “gotcha” at the 
end of this chapter. 


Also note that diamond inheritance patterns might be more problematic 
in some cases than I’ve implied here (e.g., what if B and C both have 
required constructors that call to the constructor in A?). Since such con- 
texts are rare in real-world Python, we’ll leave this topic outside this 
book’s scope (but see the super built-in function for hints—besides 
providing generic access to superclasses in single inheritance trees, 
super supports a cooperative mode for resolving some conflicts in mul- 
tiple inheritance trees). 


Scope of search order change 


In sum, by default, the diamond pattern is searched differently for classic and new-style 
classes, and this is a nonbackward-compatible change. Keep in mind, though, that this 
change primarily affects diamond pattern cases of multiple inheritance; new-style class 
inheritance works unchanged for most other inheritance tree structures. Further, it’s 
not impossible that this entire issue may be of more theoretical than practical 
importance—because the new-style search wasn’t significant enough to address until 
Python 2.2 and didn’t become standard until 3.0, it seems unlikely to impact much 
Python code. 


Having said that, I should also note that even though you might not code diamond 
patterns in classes you write yourself, because the implied object superclass is above 
every class in 3.0, every case of multiple inheritance exhibits the diamond pattern today. 
That is, in new-style classes, object automatically plays the role that the class A does in 
the example we just considered. Hence the new-style search rule not only modifies 
logical semantics, but also optimizes performance by avoiding visiting the same class 
more than once. 


Just as important, the implied object superclass in the new-style model provides default 
methods for a variety of built-in operations, including the _str__and__repr__ display 
format methods. Run a dir(object) to see which methods are provided. Without the 
new-style search order, in multiple inheritance cases the defaults in object would al- 
ways override redefinitions in user-coded classes, unless they were always made in the 
leftmost superclass. In other words, the new-style class model itself makes using the 
new-style search order more critical! 


For a more visual example of the implied object superclass in 3.0, and other examples 
of diamond patterns created by it, see the ListTree class’s output in the lister.py example 
in the preceding chapter, as well as the classtree.py tree walker example in Chapter 28. 
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New-Style Class Extensions 


Beyond the changes described in the prior section (which, frankly, may be too academic 
and obscure to matter to many readers of this book), new-style classes provide a handful 
of more advanced class tools that have more direct and practical application. The fol- 
lowing sections provide an overview of each of these additional features, available for 
new-style class in Python 2.6 and all classes in Python 3.0. 


Instance Slots 


By assigning a sequence of string attribute names to a special __slots__ class attribute, 
it is possible for a new-style class to both limit the set of legal attributes that instances 
of the class will have and optimize memory and speed performance. 


This special attribute is typically set by assigning a sequence of string names to the 
variable _ slots at the top level of a class statement: only those names in the 
__slots__ list can be assigned as instance attributes. However, like all names in Python, 
instance attribute names must still be assigned before they can be referenced, even if 
they’re listed in __slots__. For example: 

>>> class limiter(object): 

_ slots = ['age', ‘name’, 'job'] 

>>> x = limiter() 

>>> X.age # Must assign before use 

AttributeError: age 


>>> x.age = 40 

>>> x.age 

40 

>>> x.ape = 1000 # Illegal: not in _ slots _ 
AttributeError: 'limiter' object has no attribute 'ape' 


Slots are something of a break with Python’s dynamic nature, which dictates that any 
name may be created by assignment. However, this feature is envisioned as both a way 
to catch “typo” errors like this (assignments to illegal attribute names not in 
__slots__are detected), as well as an optimization mechanism. Allocating a namespace 
dictionary for every instance object can become expensive in terms of memory if many 
instances are created and only a few attributes are required. To save space and speed 
execution (to a degree that can vary per program), instead of allocating a dictionary for 
each instance, slot attributes are stored sequentially for quicker lookup. 


Slots and generic code 


In fact, some instances with slots may not have a _ dict__ attribute dictionary at all, 
which can make some metaprograms more complex (including some coded in this 
book). Tools that generically list attributes or access attributes by string name, for 
example, must be careful to use more storage-neutral tools than __dict__, such as the 
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getattr, setattr, and dir built-in functions, which apply to attributes based on either 
_dict__ or __slots__ storage. In some cases, both attribute sources may need to be 
queried for completeness. 


For example, when slots are used, instances do not normally have an attribute dic- 
tionary—Python uses the class descriptors feature covered in Chapter 37 to allocate 
space for slot attributes in the instance instead. Only names in the slots list can be 
assigned to instances, but slot-based attributes can still be fetched and set by name 
using generic tools. In Python 3.0 (and in 2.6 for classes derived from object): 


>>> class C: 
_ slots = ['a', 'b'] # _slots_ means no __dict__ by default 


>>> X.__dict__ 

AttributeError: 'C' object has no attribute '__dict_' 

>>> getattr(X, 'a') 

1 

>>> setattr(X, 'b', 2) # But getattr() and setattr() still work 

>>> X.b 

2 

>>> 'a' in dir(X) # And dir() finds slot attributes too 

True 

>>> 'b' in dir(X) 

True 
Without an attribute namespaces dictionary, it’s not possible to assign new names to 
instances that are not names in the slots list: 

>>> class D: 

_slots_ = ['a', 'b'] 
def _init__(self): self.d = 4 # Cannot add new names if no _dict__ 

>>> X = D() 

AttributeError: 'D' object has no attribute 'd' 
However, extra attributes can still be accommodated by including _dict_ in 
__slots_, in order to allow for an attribute namespace dictionary. In this case, both 


storage mechanisms are used, but generic tools such as getattr allow us to treat them 
as a single set of attributes: 


>>> class D: 
_slots_ = ['a', 'b', '_dict_'] # List __dict__ to include one too 
c =3 # Class attrs work normally 
def _init_ (self): self.d = 4 # d put in __dict__, ain _ slots 
>>> X = D() 
>>> X.d 
4 
>>> X.__dict__ # Some objects have both __dict__ and __slots 
{'d': 4} # getattr() can fetch either type of attr 
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>>> X.__slots__ 

['a', 'b', '_dict_'] 

>>> X.C 

3 

>>> X.a # All instance attrs undefined until assigned 
AttributeError: a 

>> Xa ad 

>>> getattr(X, 'a',), getattr(X, 'c'), getattr(X, 'd') 

(1, 3, 4) 


Code that wishes to list all instance attributes generically, though, may still need to 
allow for both storage forms, since dir also returns inherited attributes (this relies on 
dictionary iterators to collect keys): 


>>> for attr in list(X._ dict__) + X._slots_: 
print(attr, '=>', getattr(X, attr)) 


w 
1 

Vv 

e 


_dict_ => {'d': 4} 
Since either can be omitted, this is more correctly coded as follows (getattr allows for 
defaults): 


>>> for attr in list(getattr(X, '__dict_', [])) + getattr(X, '_slots_', []): 
print(attr, '=>', getattr(X, attr)) 


w 
tI 

Vv 

e 


_dict_ => {'d': 4} 


Multiple __slot__ lists in superclasses 


Note, however, that this code addresses only slot names in the lowest _slots__ at- 
tribute inherited by an instance. If multiple classes in a class tree have their own 
__slots__ attributes, generic programs must develop other policies for listing attributes 
(e.g., classifying slot names as attributes of classes, not instances). 


Slot declarations can appear in multiple classes in a class tree, but they are subject to a 
number of constraints that are somewhat difficult to rationalize unless you understand 
the implementation of slots as class-level descriptors (a tool we’ll study in detail in the 
last part of this book): 


e Jfa subclass inherits from a superclass withouta__slots_,the dict__ attribute 
of the superclass will always be accessible, making a _ slots _ in the subclass 
meaningless. 


e Jfa class defines the same slot name asa superclass, the version of the name defined 
by the superclass slot will be accessible only by fetching its descriptor directly from 
the superclass. 
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e Because the meaning of a__slots__ declaration is limited to the class in which it 
appears, subclasses will havea _dict__ unless they also definea__slots_. 


In terms of listing instance attributes generically, slots in multiple classes might require 
manual class tree climbs, dir usage, or a policy that treats slot names as a different 
category of names altogether: 


>>> class E: 
_ slots __ 


['c', 'd'] # Superclass has slots 


>>> class D(E): 
_slots_ = ['a', '_dict_'] # So does its subclass 
>>> X = D() 
>>> X.a = 1; X.b = 2; X.c = 3 # The instance is the union 
>>> X.a, X.C 


(1, 3) 


>>> E.__slots__ # But slots are not concatenated 
['c', 'd'] 

>>> D.__slots__ 

['a', '_dict_'] 

>>> X._slots__ # Instance inherits *lowest* __slots__ 
['a', '_dict_'] 

>>> X.__dict__ # And has its own an attr dict 


{'b': 2} 


>>> for attr in list(getattr(X, '_ dict__', [])) + getattr(X, '_slots_', []): 
print(attr, '=>', getattr(X, attr)) 


b => 2 # Superclass slots missed! 

a=>1 

_ dict_ => {'b': 2} 

>>> dir(X) # dir() includes all slot names 

[...many names omitted... 'a', 'b', 'c', 'd'] 
When such generality is possible, slots are probably best treated as class attributes, 
rather than trying to mold them to appear the same as normal instance attributes. For 
more onslots in general, see the Python standard manual set. Also watch for an example 
that allows for attributes based on both _ slots and _dict_ storage in the 
Private decorator discussion of Chapter 38. 


For a prime example of why generic programs may need to care about slots, see the 
lister.py display mix-in classes example in the multiple inheritance section of the prior 
chapter; a note there describes the example’s slot concerns. In such a tool that attempts 
to list attributes generically, slot usage requires either extra code or the implementation 
of policies regarding the handling of slot-based attributes in general. 
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Class Properties 


A mechanism known as properties provides another way for new-style classes to define 
automatically called methods for access or assignment to instance attributes. At least 
for specific attributes, this feature is an alternative to many current uses of the 
__getattr__ and _setattr__ overloading methods we studied in Chapter 29. Proper- 
ties have a similar effect to these two methods, but they incur an extra method call for 
any accesses to names that require dynamic computation. Properties (and slots) are 
based on a new notion of attribute descriptors, which is too advanced for us to cover 
here. 


In short, a property is a type of object assigned to a class attribute name. A property is 
generated by calling the property built-in with three methods (handlers for get, set, and 
delete operations), as well as a docstring; if any argument is passed as None or omitted, 
that operation is not supported. Properties are typically assigned at the top level of a 
class statement [e.g., name = property(...)]. When thus assigned, accesses to the class 
attribute itself (e.g., obj.name) are automatically routed to one of the accessor methods 
passed into the property. For example, the _ getattr__ method allows classes to in- 
tercept undefined attribute references: 
>>> class classic: 
def __getattr_(self, name): 
if name == 'age': 
return 40 


else: 
raise AttributeError 


>>> x = classic() 


>>> X.age # Runs __getattr__ 
40 

>>> x.name # Runs __getattr__ 
AttributeError 


Here is the same example, coded with properties instead (note that properties are 
available for all classes but require the new-style object derivation in 2.6 to work prop- 
erly for intercepting attribute assignments): 
>>> class newprops(object): 
def getage(self): 
return 40 
age = property(getage, None, None, None) # get, set, del, docs 


>>> x = newprops() 


>>> X.age # Runs getage 
40 
>>> x.name # Normal fetch 


AttributeError: newprops instance has no attribute 'name' 
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For some coding tasks, properties can be less complex and quicker to run than the 
traditional techniques. For example, when we add attribute assignment support, 
properties become more attractive—there’s less code to type, and no extra method calls 
are incurred for assignments to attributes we don’t wish to compute dynamically: 


>>> class newprops(object): 
def getage(self): 
return 40 
def setage(self, value): 
print('set age:', value) 
self._age = value 


age = property(getage, setage, None, None) 


>>> X = newprops() 
>>> X.age 

40 

>>> X.age = 42 

set age: 42 

>>> X._age 

42 

>>> x.job = ‘trainer’ 
>>> x.job 

"trainer' 


# Runs getage 
# Runs setage 
# Normal fetch; no getage call 


# Normal assign; no setage call 
# Normal fetch; no getage call 


The equivalent classic class incurs extra method calls for assignments to attributes not 
being managed and needs to route attribute assignments through the attribute dic- 
tionary (or, for new-style classes, to the object superclass’s _ setattr__) to avoid loops: 


>>> class classic: 
def _ getattr_(self, name): 
if name == 'age': 
return 40 
else: 
raise AttributeError 


def _setattr_(self, name, value): 


print('set:', name, value) 


if name == 'age': 
self._dict__['_age' 

else: 
self._dict__[name] 


>>> x = classic() 
>>> X.age 

40 

>>> X.age = 41 

set: age 41 

>>> X._age 

41 

>>> x.job = ‘trainer’ 
>>> x.job 


] 


= value 


value 


# On undefined reference 


# On all assignments 


# Runs __getattr__ 
# Runs __setattr__ 
# Defined: no __getattr__ call 


# Runs __setattr__ again 
# Defined: no __getattr__ call 
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Properties seem like a win for this simple example. However, some applications of 
__getattr__and_ setattr_ may still require more dynamic or generic interfaces than 
properties directly provide. For example, in many cases, the set of attributes to be 
supported cannot be determined when the class is coded, and may not even exist in 
any tangible form (e.g., when delegating arbitrary method references to a wrapped/ 
embedded object generically). In such cases, a generic _ getattr__ ora __setattr__ 
attribute handler with a passed-in attribute name may be preferable. Because such ge- 
neric handlers can also handle simpler cases, properties are often an optional extension. 


For more details on both options, stay tuned for Chapter 37 in the final part of this 
book. As we’ll see there, it’s also possible to code properties using function decorator 
syntax, a topic introduced later in this chapter. 


_ getattribute__ and Descriptors 


The _ getattribute_ method, available for new-style classes only, allows a class to 
intercept all attribute references, not just undefined references, like _ getattr_. It is 
also somewhat trickier to use than _ getattr_: it is prone to loops, much like 
__setattr_, but in different ways. 


In addition to properties and operator overloading methods, Python supports the no- 
tion of attribute descriptors—classes with _ get__ and _ set__ methods, assigned to 
class attributes and inherited by instances, that intercept read and write accesses to 
specific attributes. Descriptors are in a sense a more general form of properties; in fact, 
properties are a simplified way to define a specific type of descriptor, one that runs 
functions on access. Descriptors are also used to implement the slots feature we met 
earlier. 


Because properties, _getattribute_, and descriptors are somewhat advanced topics, 
we'll defer the rest of their coverage, as well as more on properties, to Chapter 37 in 
the final part of this book. 


Metaclasses 


Most of the changes and feature additions of new-style classes integrate with the notion 
of subclassable types mentioned earlier in this chapter, because subclassable types and 
new-style classes were introduced in conjunction with a merging of the type/class di- 
chotomy in Python 2.2 and beyond. As we’ve seen, in 3.0, this merging is complete: 
classes are now types, and types are classes. 


Along with these changes, Python also grew a more coherent protocol for coding 
metaclasses, which are classes that subclass the type object and intercept class creation 
calls. As such, they provide a well-defined hook for management and augmentation of 
class objects. They are also an advanced topic that is optional for most Python pro- 
grammers, so we’ll postpone further details here. We’ll meet metaclasses briefly later 
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in this chapter in conjunction with class decorators, and we’ll explore them in full detail 
in Chapter 39, in the final part of this book. 


Static and Class Methods 


As of Python 2.2, it is possible to define two kinds of methods within a class that can 
be called without an instance: static methods work roughly like simple instance-less 
functions inside a class, and class methods are passed a class instead of an instance. 
Although this feature was added in conjunction with the new-style classes discussed in 
the prior sections, static and class methods work for classic classes too. 


To enable these method modes, special built-in functions called staticmethod and 
classmethod must be called within the class, or invoked with the decoration syntax we'll 
meet later in this chapter. In Python 3.0, instance-less methods called only through a 
class name do not require a staticmethod declaration, but such methods called through 
instances do. 


Why the Special Methods? 


As we’ve learned, a class method is normally passed an instance object in its first ar- 
gument, to serve as the implied subject of the method call. Today, though, there are 
two ways to modify this model. Before I explain what they are, I should explain why 
this might matter to you. 


Sometimes, programs need to process data associated with classes instead of instances. 
Consider keeping track of the number of instances created from a class, or maintaining 
a list of all of a class’s instances that are currently in memory. This type of information 
and its processing are associated with the class rather than its instances. That is, the 
information is usually stored on the class itself and processed in the absence of any 
instance. 


For such tasks, simple functions coded outside a class can often suffice—because they 
can access class attributes through the class name, they have access to class data and 
never require access to an instance. However, to better associate such code with a class, 
and to allow such processing to be customized with inheritance as usual, it would be 
better to code these types of functions inside the class itself. To make this work, we 
need methods in a class that are not passed, and do not expect, a self instance 
argument. 


Python supports such goals with the notion of static methods—simple functions with 
no self argument that are nested in a class and are designed to work on class attributes 
instead of instance attributes. Static methods never receive an automatic self argument, 
whether called through a class or an instance. They usually keep track of information 
that spans all instances, rather than providing behavior for instances. 
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Although less commonly used, Python also supports the notion of class methods— 
methods of a class that are passed a class object in their first argument instead of an 
instance, regardless of whether they are called through an instance or a class. Such 
methods can access class data through their self class argument even if called through 
an instance. Normal methods (now known in formal circles as instance methods) still 
receive a subject instance when called; static and class methods do not. 


Static Methods in 2.6 and 3.0 


The concept of static methods is the same in both Python 2.6 and 3.0, but its imple- 
mentation requirements have evolved somewhat in Python 3.0. Since this book covers 
both versions, I need to explain the differences in the two underlying models before we 
get to the code. 


Really, we already began this story in the preceding chapter, when we explored the 
notion of unbound methods. Recall that both Python 2.6 and 3.0 always pass an in- 
stance to a method that is called through an instance. However, Python 3.0 treats 
methods fetched directly from a class differently than 2.6: 


° In Python 2.6, fetching a method from a class produces an unbound method, which 
cannot be called without manually passing an instance. 


e In Python 3.0, fetching a method from a class produces a simple function, which 
can be called normally with no instance present. 


In other words, Python 2.6 class methods always require an instance to be passed in, 
whether they are called through an instance or a class. By contrast, in Python 3.0 we 
are required to pass an instance to a method only if the method expects one—methods 
without a self instance argument can be called through the class without passing an 
instance. That is, 3.0 allows simple functions in a class, as long as they do not expect 
and are not passed an instance argument. The net effect is that: 


* In Python 2.6, we must always declare a method as static in order to call it without 
an instance, whether it is called through a class or an instance. 


° In Python 3.0, we need not declare such methods as static if they will be called 
through a class only, but we must do so in order to call them through an instance. 


To illustrate, suppose we want to use class attributes to count how many instances are 
generated from a class. The following file, spam.py, makes a first attempt—its class has 
a counter stored as a class attribute, a constructor that bumps up the counter by one 
each time a new instance is created, and a method that displays the counter’s value. 
Remember, class attributes are shared by all instances. Therefore, storing the counter 
in the class object itself ensures that it effectively spans all instances: 
class Spam: 
numInstances = 0 
def _ init__(self): 
Spam.numInstances = Spam.numInstances + 1 
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def printNumInstances(): 
print("Number of instances created: ", Spam.numInstances) 


The printNumInstances method is designed to process class data, not instance data— 
it’s about all the instances, not any one in particular. Because of that, we want to be 
able to call it without having to pass an instance. Indeed, we don’t want to make an 
instance to fetch the number of instances, because this would change the number of 
instances we’re trying to fetch! In other words, we want a self-less “static” method. 


Whether this code works or not, though, depends on which Python you use, and which 
way you call the method—through the class or through an instance. In 2.6 (and 2.X in 
general), calls to a self-less method function through both the class and instances fail 
(I’ve omitted some error text here for space): 

C:\misc> c:\python26\python 

>>> from spam import Spam 

>>> a = Spam() # Cannot call unbound class methods in 2.6 


>>> b = Spam() # Methods expect a self object by default 
>>> c = Spam() 


>>> Spam. printNumInstances() 

TypeError: unbound method printNumInstances() must be called with Spam instance 
as first argument (got nothing instead) 

>>> a.printNumInstances() 

TypeError: printNumInstances() takes no arguments (1 given) 


The problem here is that unbound instance methods aren’t exactly the same as simple 
functions in 2.6. Even though there are no arguments in the def header, the method 
still expects an instance to be passed in when it’s called, because the function is asso- 
ciated with a class. In Python 3.0 (and later 3.X releases), calls to self-less methods made 
through classes work, but calls from instances fail: 

C:\misc> c:\python30\python 

>>> from spam import Spam 

>>> a = Spam() # Can call functions in class in 3.0 


>>> b = Spam() # Calls through instances still pass a self 
>>> c = Spam() 


>>> Spam. printNumInstances() # Differs in 3.0 

Number of instances created: 3 

>>> a.printNumInstances() 

TypeError: printNumInstances() takes no arguments (1 given) 


That is, calls to instance-less methods like printNumInstances made through the class 
fail in Python 2.6 but work in Python 3.0. On the other hand, calls made through an 
instance fail in both Pythons, because an instance is automatically passed to a method 
that does not have an argument to receive it: 


Spam. printNumInstances() # Fails in 2.6, works in 3.0 
instance. printNumInstances() # Fails in both 2.6 and 3.0 


If you’re able to use 3.0 and stick with calling self-less methods through classes only, 
you already have a static method feature. However, to allow self-less methods to be 
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called through classes in 2.6 and through instances in both 2.6 and 3.0, you need to 
either adopt other designs or be able to somehow mark such methods as special. Let’s 
look at both options in turn. 


Static Method Alternatives 


Short of marking a self-less method as special, there are a few different coding structures 
that can be tried. If you want to call functions that access class members without an 
instance, perhaps the simplest idea is to just make them simple functions outside the 
class, not class methods. This way, an instance isn’t expected in the call. For example, 
the following mutation of spam.py works the same in Python 3.0 and 2.6 (albeit dis- 
playing extra parentheses in 2.6 for its print statement): 


def printNumInstances(): 
print("Number of instances created: ", Spam.numInstances) 


class Spam: 
numInstances = 0 
def _ init__(self): 
Spam.numInstances = Spam.numInstances + 1 


>>> import spam 

>>> a = spam. Spam() 
>>> b = spam. Spam() 
>>> c = spam. Spam() 


>>> spam. printNumInstances() # But function may be too far removed 
Number of instances created: 3 # And cannot be changed via inheritance 
>>> spam.Spam.numInstances 

3 


Because the class name is accessible to the simple function as a global variable, this 
works fine. Also, note that the name of the function becomes global, but only to this 
single module; it will not clash with names in other files of the program. 


Prior to static methods in Python, this structure was the general prescription. Because 
Python already provides modules as a namespace-partitioning tool, one could argue 
that there’s not typically any need to package functions in classes unless they implement 
object behavior. Simple functions within modules like the one here do much of what 
instance-less class methods could, and are already associated with the class because 
they live in the same module. 


Unfortunately, this approach is still less than ideal. For one thing, it adds to this file’s 
scope an extra name that is used only for processing a single class. For another, the 
function is much less directly associated with the class; in fact, its definition could be 
hundreds of lines away. Perhaps worse, simple functions like this cannot be customized 
by inheritance, since they live outside a class’s namespace: subclasses cannot directly 
replace or extend such a function by redefining it. 
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We might try to make this example work in a version-neutral way by using a normal 
method and always calling it through (or with) an instance, as usual: 
class Spam: 
numInstances = 0 
def _ init__(self): 
Spam.numInstances = Spam.numInstances + 1 


def printNumInstances(self): 
print("Number of instances created: 


» Spam.numInstances) 


>>> from spam import Spam 

>>> a, b, c = Spam(), Spam(), Spam() 

>>> a.printNumInstances() 

Number of instances created: 3 

>>> Spam. printNumInstances (a) 

Number of instances created: 3 

>>> Spam().printNumInstances() # But fetching counter changes counter! 
Number of instances created: 4 


Unfortunately, as mentioned earlier, such an approach is completely unworkable if we 
don’t have an instance available, and making an instance changes the class data, as 
illustrated in the last line here. A better solution would be to somehow mark a method 
inside a class as never requiring an instance. The next section shows how. 


Using Static and Class Methods 


Today, there is another option for coding simple functions associated with a class that 
may be called through either the class or its instances. As of Python 2.2, we can code 
classes with static and class methods, neither of which requires an instance argument 
to be passed in when invoked. To designate such methods, classes call the built-in 
functions staticmethod and classmethod, as hinted in the earlier discussion of new-style 
classes. Both mark a function object as special—i.e., as requiring no instance if static 
and requiring a class argument if a class method. For example: 
class Methods: 


def imeth(self, x): # Normal instance method: passed a self 
print(self, x) 


def smeth(x): # Static: no instance passed 
print (x) 
def cmeth(cls, x): # Class: gets class, not instance 


print(cls, x) 


smeth = staticmethod(smeth) # Make smeth a static method 
cmeth = classmethod(cmeth) # Make cmeth a class method 


Notice how the last two assignments in this code simply reassign the method names 
smeth and cmeth. Attributes are created and changed by any assignment in a class 
statement, so these final assignments simply overwrite the assignments made earlier by 
the defs. 
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Technically, Python now supports three kinds of class-related methods: instance, 
static, and class. Moreover, Python 3.0 extends this model by also allowing simple 
functions in a class to serve the role of static methods without extra protocol, when 
called through a class. 


Instance methods are the normal (and default) case that we’ve seen in this book. An 
instance method must always be called with an instance object. When you call it 
through an instance, Python passes the instance to the first (leftmost) argument auto- 
matically; when you call it through a class, you must pass along the instance manually 
(for simplicity, ’ve omitted some class imports in interactive sessions like this one): 


>>> obj = Methods() # Make an instance 

>>> obj.imeth(1) # Normal method, call through instance 
<__main__.Methods object...> 1 # Becomes imeth(obj, 1) 

>>> Methods.imeth(obj, 2) # Normal method, call through class 
<__main__.Methods object...> 2 # Instance passed explicitly 


By contrast, static methods are called without an instance argument. Unlike simple 
functions outside a class, their names are local to the scopes of the classes in which they 
are defined, and they may be looked up by inheritance. Instance-less functions can be 
called through a class normally in Python 3.0, but never by default in 2.6. Using the 
staticmethod built-in allows such methods to also be called through an instance in 3.0 
and through both a class and an instance in Python 2.6 (the first of these works in 3.0 
without staticmethod, but the second does not): 


>>> Methods.smeth(3) # Static method, call through class 

3 # No instance passed or expected 

>>> obj.smeth(4) # Static method, call through instance 
4 # Instance not passed 


Class methods are similar, but Python automatically passes the class (not an instance) 
in to a class method’s first (leftmost) argument, whether it is called through a class or 
an instance: 


>>> Methods.cmeth(5) # Class method, call through class 
<class '_main_.Methods'> 5 # Becomes cmeth(Methods, 5) 

>>> obj.cmeth(6) # Class method, call through instance 
<class '_main__.Methods'> 6 # Becomes cmeth(Methods, 6) 


Counting Instances with Static Methods 


Now, given these built-ins, here is the static method equivalent of this section’s 
instance-counting example—it marks the method as special, so it will never be passed 
an instance automatically: 
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class Spam: 
numInstances = 0 # Use static method for class data 
def _ init__(self): 
Spam.numInstances += 1 
def printNumInstances(): 
print("Number of instances:", Spam.numInstances) 
printNumInstances = staticmethod(printNumInstances) 


Using the static method built-in, our code now allows the self-less method to be called 
through the class or any instance of it, in both Python 2.6 and 3.0: 
>>> a = Spam() 


>>> b = Spam() 
>>> c = Spam() 


>>> Spam. printNumInstances() # Call as simple function 
Number of instances: 3 
>>> a.printNumInstances() # Instance argument not passed 


Number of instances: 3 


Compared to simply moving printNumInstances outside the class, as prescribed earlier, 
this version requires an extra staticmethod call; however, it localizes the function name 
in the class scope (so it won’t clash with other names in the module), moves the function 
code closer to where it is used (inside the class statement), and allows subclasses to 
customize the static method with inheritance—a more convenient approach than im- 
porting functions from the files in which superclasses are coded. The following subclass 
and new testing session illustrate: 
class Sub(Spam): 
def printNumInstances(): # Override a static method 
print("Extra stuff...") # But call back to original 


Spam. printNumInstances() 
printNumInstances = staticmethod(printNumInstances) 


>>> a = Sub() 
>>> b = Sub() 
>>> a.printNumInstances() # Call from subclass instance 


Extra stuff... 

Number of instances: 2 

>>> Sub. printNumInstances() # Call from subclass itself 
Extra stuff... 

Number of instances: 2 

>>> Spam. printNumInstances() 

Number of instances: 2 


Moreover, classes can inherit the static method without redefining it—it is run without 
an instance, regardless of where it is defined in a class tree: 

>>> class Other(Spam): pass # Inherit static method verbatim 

>>> c = Other() 


>>> c.printNumInstances() 
Number of instances: 3 
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Counting Instances with Class Methods 


Interestingly, a class method can do similar work here—the following has the same 
behavior as the static method version listed earlier, but it uses a class method that 
receives the instance’s class in its first argument. Rather than hardcoding the class 
name, the class method uses the automatically passed class object generically: 
class Spam: 
numInstances = 0 # Use class method instead of static 
def _ init__(self): 
Spam.numInstances += 1 
def printNumInstances(cls): 


print("Number of instances:", cls.numInstances) 
printNumInstances = classmethod(printNumInstances) 


This class is used in the same way as the prior versions, but its printNumInstances 
method receives the class, not the instance, when called from both the class and an 
instance: 


>>> a, b = Spam(), Spam() 


>>> a.printNumInstances() # Passes class to first argument 
Number of instances: 2 
>>> Spam. printNumInstances() # Also passes class to first argument 


Number of instances: 2 


When using class methods, though, keep in mind that they receive the most specific 
(i.e., lowest) class of the call’s subject. This has some subtle implications when trying 
to update class data through the passed-in class. For example, if in module test.py we 
subclass to customize as before, augment Spam.printNumInstances to also display its 
cls argument, and start a new testing session: 


class Spam: 
numInstances = 0 # Trace class passed in 
def _ init__(self): 
Spam.numInstances += 1 
def printNumInstances(cls): 
print("Number of instances:", cls.numInstances, cls) 
printNumInstances = classmethod(printNumInstances) 


class Sub(Spam): 
def printNumInstances(cls): # Override a class method 
print("Extra stuff...", cls) # But call back to original 
Spam. printNumInstances() 
printNumInstances = classmethod(printNumInstances) 


class Other(Spam): pass # Inherit class method verbatim 


the lowest class is passed in whenever a class method is run, even for subclasses that 
have no class methods of their own: 


>>> X, y = Sub(), Spam() 

>>> x.printNumInstances() # Call from subclass instance 
Extra stuff... <class 'test.Sub'> 

Number of instances: 2 <class 'test.Spam'> 
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>>> Sub. printNumInstances() # Call from subclass itself 
Extra stuff... <class 'test.Sub'> 

Number of instances: 2 <class 'test.Spam'> 

>>> y.printNumInstances() 

Number of instances: 2 <class 'test.Spam'> 


In the first call here, a class method call is made through an instance of the Sub subclass, 
and Python passes the lowest class, Sub, to the class method. All is well in this case— 
since Sub’s redefinition of the method calls the Spam superclass’s version explicitly, the 
superclass method in Spam receives itself in its first argument. But watch what happens 
for an object that simply inherits the class method: 

>>> z = Other() 


>>> z.printNumInstances() 
Number of instances: 3 <class 'test.Other'> 


This last call here passes Other to Spam’s class method. This works in this example 
because fetching the counter finds it in Spam by inheritance. If this method tried to 
assign to the passed class’s data, though, it would update Object, not Spam! In this 
specific case, Spam is probably better off hardcoding its own class name to update its 
data, rather than relying on the passed-in class argument. 


Counting instances per class with class methods 
In fact, because class methods always receive the lowest class in an instance’s tree: 


e Static methods and explicit class names may be a better solution for processing 
data local to a class. 


e Class methods may be better suited to processing data that may differ for each class 
in a hierarchy. 


Code that needs to manage per-class instance counters, for example, might be best off 
leveraging class methods. In the following, the top-level superclass uses a class method 
to manage state information that varies for and is stored on each class in the tree— 
similar in spirit to the way instance methods manage state information in class 
instances: 


class Spam: 
numInstances = 0 
def count(cls): # Per-class instance counters 
cls.numInstances += 1 # cls is lowest class above instance 
def _ init__(self): 
self.count() # Passes self.__class__ to count 


count = classmethod(count) 


class Sub(Spam): 
numInstances = 0 
def _ init__(self): # Redefines __init__ 
Spam. init__(self) 


class Other(Spam): # Inherits __init__ 
numInstances = 0 
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>>> x = Spam() 

>>> y1, y2 = Sub(), Sub() 

>>> z1, z2, z3 = Other(), Other(), Other() 

>>> x.numInstances, y1.numInstances, z1.numInstances 


(1, 2, 3) 
>>> Spam.numInstances, Sub.numInstances, Other.numInstances 


(1, 2, 3) 


Static and class methods have additional advanced roles, which we will finesse here; 
see other resources for more use cases. In recent Python versions, though, the static 
and class method designations have become even simpler with the advent of function 
decoration syntax—a way to apply one function to another that has roles well beyond 
the static method use case that was its motivation. This syntax also allows us to augment 
classes in Python 2.6 and 3.0—to initialize data like the numInstances counter in the 
last example, for instance. The next section explains how. 


Decorators and Metaclasses: Part 1 


Because the staticmethod call technique described in the prior section initially seemed 
obscure to some users, a feature was eventually added to make the operation simpler. 
Function decorators provide a way to specify special operation modes for functions, by 
wrapping them in an extra layer of logic implemented as another function. 


Function decorators turn out to be general tools: they are useful for adding many types 
of logic to functions besides the static method use case. For instance, they may be used 
to augment functions with code that logs calls made to them, checks the types of passed 
arguments during debugging, and so on. In some ways, function decorators are similar 
to the delegation design pattern we explored in Chapter 30, but they are designed to 
augment a specific function or method call, not an entire object interface. 


Python provides some built-in function decorators for operations such as marking static 
methods, but programmers can also code arbitrary decorators of their own. Although 
they are not strictly tied to classes, user-defined function decorators often are coded as 
classes to save the original functions, along with other data, as state information. 
There’s also a more recent related extension available in Python 2.6 and 3.0: class dec- 
orators are directly tied to the class model, and their roles overlap with metaclasses. 


Function Decorator Basics 


Syntactically, a function decorator is a sort of runtime declaration about the function 
that follows. A function decorator is coded on a line by itself just before the def state- 
ment that defines a function or method. It consists of the @ symbol, followed by what 
we call a metafunction—a function (or other callable object) that manages another 
function. Static methods today, for example, may be coded with decorator syntax like 
this: 
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class C: 
@staticmethod # Decoration syntax 
def meth(): 


Internally, this syntax has the same effect as the following (passing the function through 
the decorator and assigning the result back to the original name): 


class C: 
def meth(): 
meth = staticmethod(meth) # Rebind name 


Decoration rebinds the method name to the decorator’s result. The net effect is that 
calling the method function’s name later actually triggers the result of its 
staticmethod decorator first. Because a decorator can return any sort of object, this 
allows the decorator to insert a layer of logic to be run on every call. The decorator 
function is free to return either the original function itself, or a new object that saves 
the original function passed to the decorator to be invoked indirectly after the extra 
logic layer runs. 


With this addition, here’s a better way to code our static method example from the 
prior section in either Python 2.6 or 3.0 (the classmethod decorator is used the same 
way): 


class Spam: 
numInstances = 0 
def init__(self): 
Spam.numInstances = Spam.numInstances + 1 


@staticmethod 
def printNumInstances(): 
print("Number of instances created: 


» Spam.numInstances) 


a = Spam() 

b = Spam() 

c = Spam() 

Spam. printNumInstances() # Calls from both classes and instances work now! 
a.printNumInstances() # Both print "Number of instances created: 3" 


Keep in mind that staticmethod is still a built-in function; it may be used in decoration 
syntax, just because it takes a function as argument and returns a callable. In fact, any 
such function can be used in this way—even user-defined functions we code ourselves, 
as the next section explains. 


A First Function Decorator Example 


Although Python provides a handful of built-in functions that can be used as decorators, 
we can also write custom decorators of our own. Because of their wide utility, we’re 
going to devote an entire chapter to coding decorators in the next part of this book. As 
a quick example, though, let’s look at a simple user-defined decorator at work. 


Decorators and Metaclasses: Part1 | 805 


Recall from Chapter 29 that the __call__ operator overloading method implements a 
function-call interface for class instances. The following code uses this to define a class 
that saves the decorated function in the instance and catches calls to the original name. 
Because this is a class, it also has state information (a counter of calls made): 


class tracer: 

def _ init__(self, func): 
self.calls = 0 
self.func = func 

def _call_ (self, *args): 
self.calls += 1 
print('call %s to %s' % (self.calls, self.func. name_)) 
self.func(*args) 


@tracer # Same as spam = tracer(spam) 
def spam(a, b, c): # Wrap spam in a decorator object 
print(a, b, c) 


spam(1, 2, 3) # Really calls the tracer wrapper object 
spam('a', 'b', 'c') # Invokes __call__in class 
spam(4, 5, 6) # __call__ adds logic and runs original object 


Because the spam function is run through the tracer decorator, when the original 
spam name is called it actually triggers the _call__ method in the class. This method 
counts and logs the call, and then dispatches it to the original wrapped function. Note 
how the *name argument syntax is used to pack and unpack the passed-in arguments; 
because of this, this decorator can be used to wrap any function with any number of 
positional arguments. 


The net effect, again, is to add a layer of logic to the original spam function. Here is the 
script’s output—the first line comes from the tracer class, and the second comes from 
the spam function: 

call 1 to spam 

123 

call 2 to spam 

abc 

call 3 to spam 

456 


Trace through this example’s code for more insight. As it is, this decorator works for 
any function that takes positional arguments, but it does not return the decorated 
function’s result, doesn’t handle keyword arguments, and cannot decorate class 
method functions (in short, for methods its__call__ would be passed a tracer instance 
only). As we’ll see in Part VIII, there are a variety of ways to code function decorators, 
including nested def statements; some of the alternatives are better suited to methods 
than the version shown here. 
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Class Decorators and Metaclasses 


Function decorators turned out to be so useful that Python 2.6 and 3.0 expanded the 
model, allowing decorators to be applied to classes as well as functions. In short, class 
decorators are similar to function decorators, but they are run at the end of a class 
statement to rebind a class name to a callable. As such, they can be used to either 
manage classes just after they are created, or insert a layer of wrapper logic to manage 
instances when they are later created. Symbolically, the code structure: 


def decorator(aClass): ... 


@decorator 
class C: ... 


is mapped to the following equivalent: 


def decorator(aClass): ... 


class C: ... 
C = decorator(C) 


The class decorator is free to augment the class itself, or return an object that intercepts 
later instance construction calls. For instance, in the example in the section “Counting 
instances per class with class methods” on page 803, we could use this hook to auto- 
matically augment the classes with instance counters and any other data required: 


def count(aClass): 
aClass.numInstances = 0 


return aClass # Return class itself, instead of a wrapper 
@count 
class Spam: ... # Same as Spam = count(Spam) 
@count 
class Sub(Spam): ... # numiInstances = 0 not needed here 
@count 


class Other(Spam): ... 


Metaclasses are a similarly advanced class-based tool whose roles often intersect with 
those of class decorators. They provide an alternate model, which routes the creation 
of a class object to a subclass of the top-level type class, at the conclusion of a class 
statement: 


class Meta(type): 
def _new_ (meta, classname, supers, classdict): ... 


class C(metaclass=Meta): ... 
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In Python 2.6, the effect is the same, but the coding differs—use a class attribute instead 
of a keyword argument in the class header: 


class C: 
__metaclass_ = Meta 


The metaclass generally redefines the _new__ or __init__ method of the type class, in 
order to assume control of the creation or initialization of a new class object. The net 
effect, as with class decorators, is to define code to be run automatically at class creation 
time. Both schemes are free to augment a class or return an arbitrary object to replace 
it—a protocol with almost limitless class-based possibilities. 


For More Details 


Naturally, there’s much more to the decorator and metaclass stories than I’ve shown 
here. Although they are a general mechanism, decorators and metaclasses are advanced 
features of interest primarily to tool writers, not application programmers, so we'll defer 
additional coverage until the final part of this book: 


e Chapter 37 shows how to code properties using function decorator syntax. 


e Chapter 38 has much more on decorators, including more comprehensive 
examples. 


e Chapter 39 covers metaclasses, and more on the class and instance management 
story. 


Although these chapters cover advanced topics, they'll also provide us with a chance 
to see Python at work in more substantial examples than much of the rest of the book 
was able to provide. 


Class Gotchas 


Most class issues can be boiled down to namespace issues (which makes sense, given 
that classes are just namespaces with a few extra tricks). Some of the topics we’ll cover 
in this section are more like case studies of advanced class usage than real problems, 
and one or two of these gotchas have been eased by recent Python releases. 


Changing Class Attributes Can Have Side Effects 


Theoretically speaking, classes (and class instances) are mutable objects. Like built-in 
lists and dictionaries, they can be changed in-place by assigning to their attributes— 
and as with lists and dictionaries, this means that changing a class or instance object 
may impact multiple references to it. 
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That’s usually what we want (and is how objects change their state in general), but 
awareness of this issue becomes especially critical when changing class attributes. Be- 
cause all instances generated from a class share the class’s namespace, any changes at 
the class level are reflected in all instances, unless they have their own versions of the 
changed class attributes. 


Because classes, modules, and instances are all just objects with attribute namespaces, 
you can normally change their attributes at runtime by assignments. Consider the fol- 
lowing class. Inside the class body, the assignment to the name a generates an attribute 
X.a, which lives in the class object at runtime and will be inherited by all of X’s instances: 


>>> class X: 


a=1 # Class attribute 
>>> I = X() 
>>> I.a # Inherited by instance 
1 
>>> X.a 
1 


So far, so good—this is the normal case. But notice what happens when we change the 
class attribute dynamically outside the class statement: it also changes the attribute in 
every object that inherits from the class. Moreover, new instances created from the class 
during this session or program run also get the dynamically set value, regardless of what 
the class’s source code says: 


>>> X.a = 2 # May change more than X 

>>> I.a # I changes too 

2 

>>> J = X() # J inherits from X's runtime values 

>>> J.a # (but assigning to J.a changes a in J, not X or I) 


2 


Is this a useful feature or a dangerous trap? You be the judge. As we learned in Chap- 
ter 26, you can actually get work done by changing class attributes without ever making 
a single instance; this technique can simulate the use of “records” or “structs” in other 
languages. As a refresher, consider the following unusual but legal Python program: 


class X: pass # Make a few attribute namespaces 
class Y: pass 


# Use class attributes as variables 
# No instances anywhere to be found 


< XxX XxX XxX 
v a oT wW 
iT] 


1 
2 
= 3 
X.a + X.b + X.c 
for X.i in range(Y.a): print(X.i)  # Prints 0..5 


Here, the classes X and Y work like “fileless” modules—namespaces for storing variables 
we don’t want to clash. This is a perfectly legal Python programming trick, but it’s less 
appropriate when applied to classes written by others; you can’t always be sure that 
class attributes you change aren’t critical to the class’s internal behavior. If you’re out 
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to simulate a C struct, you may be better off changing instances than classes, as that 
way only one object is affected: 


class Record: pass 


X = Record() 
X.name = 'bob' 
X.job = ‘Pizza maker’ 


Changing Mutable Class Attributes Can Have Side Effects, Too 


This gotcha is really an extension of the prior. Because class attributes are shared by all 
instances, if a class attribute references a mutable object, changing that object in-place 
from any instance impacts all instances at once: 


>>> class C: 


shared = [] # Class attribute 
def init__(self): 
self.perobj = [] # Instance attribute 

>>> x = C() # Two instances 
>>> y = C() # Implicitly share class attrs 
>>> y.shared, y.perobj 
CL 0) 
>>> x.shared.append('spam') # Impacts y's view too! 
>>> x.perobj.append('spam') # Impacts x's data only 


>>> x.shared, x.perobj 
(['spam'], ['spam']) 


>>> y.shared, y.perobj # y sees change made through x 
(['spam'], []) 

>>> C.shared # Stored on class and shared 
['spam' ] 


This effect is no different than many we’ve seen in this book already: mutable objects 
are shared by simple variables, globals are shared by functions, module-level objects 
are shared by multiple importers, and mutable function arguments are shared by the 
caller and the callee. All of these are cases of general behavior—multiple references to 
a mutable object—and all are impacted if the shared object is changed in-place from 
any reference. Here, this occurs in class attributes shared by all instances via inheri- 
tance, but it’s the same phenomenon at work. It may be made more subtle by the 
different behavior of assignments to instance attributes themselves: 


x.shared.append('spam' ) # Changes shared object attached to class in-place 
x.shared = 'spam' # Changed or creates instance attribute attached to x 


but again, this is not a problem, it’s just something to be aware of; shared mutable class 
attributes can have many valid uses in Python programs. 
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Multiple Inheritance: Order Matters 


This may be obvious by now, but it’s worth underscoring: if you use multiple inheri- 
tance, the order in which superclasses are listed in the class statement header can be 
critical. Python always searches superclasses from left to right, according to their order 
in the header line. 


For instance, in the multiple inheritance example we studied in Chapter 30, suppose 
that the Super class implemented a__str__ method, too: 


class ListTree: 
def str (self): ... 


class Super: 
def _str_ (self): ... 


class Sub(ListTree, Super): # Get ListTree's __str__ by listing it first 
x = Sub() # Inheritance searches ListTree before Super 


Which class would we inherit it from—ListTree or Super? As inheritance searches pro- 
ceed from left to right, we would get the method from whichever class is listed first 
(leftmost) in Sub’s class header. Presumably, we would list ListTree first because its 
whole purpose is its custom __str__ (indeed, we had to do this in Chapter 30 when 
mixing this class with a tkinter.Button that had a _str__ of its own). 


But now suppose Super and ListTree have their own versions of other same-named 
attributes, too. If we want one name from Super and another from ListTree, the order 
in which we list them in the class header won’t help—we will have to override inher- 
itance by manually assigning to the attribute name in the Sub class: 

class ListTree: 


def _str_ (self): ... 
def other(self): ... 


class Super: 
def _str_ (self): ... 
def other(self): ... 


class Sub(ListTree, Super): # Get ListTree's __str__ by listing it first 
other = Super.other # But explicitly pick Super's version of other 
def _ init__(self): 


x = Sub() # Inheritance searches Sub before ListTree/Super 


Here, the assignment to other within the Sub class creates Sub. other—a reference back 
to the Super.other object. Because it is lower in the tree, Sub. other effectively hides 
ListTree.other, the attribute that the inheritance search would normally find. Simi- 
larly, if we listed Super first in the class header to pick up its other, we would need to 
select ListTree’s method explicitly: 
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class Sub(Super, ListTree): # Get Super's other by order 
_ str = Lister. str__ # Explicitly pick Lister.__str__ 


Multiple inheritance is an advanced tool. Even if you understood the last paragraph, 
it’s still a good idea to use it sparingly and carefully. Otherwise, the meaning of a name 
may come to depend on the order in which classes are mixed in an arbitrarily 
far-removed subclass. (For another example of the technique shown here in action, see 
the discussion of explicit conflict resolution in “The “New-Style” Class 
Model” on page 777.) 


As a rule of thumb, multiple inheritance works best when your mix-in classes are as 
self-contained as possible—because they may be used in a variety of contexts, they 
should not make assumptions about names related to other classes in a tree. The 
pseudoprivate __X attributes feature we studied in Chapter 30 can help by localizing 
names that a class relies on owning and limiting the names that your mix-in classes add 
to the mix. In this example, for instance, if ListTree only means to export its custom 
__str__,itcanname its other method _ other to avoid clashing with like-named classes 
in the tree. 


Methods, Classes, and Nested Scopes 


This gotcha went away in Python 2.2 with the introduction of nested function scopes, 
but I’ve retained it here for historical perspective, for readers working with older Python 
releases, and because it demonstrates what happens to the new nested function scope 
rules when one layer of the nesting is a class. 


Classes introduce local scopes, just as functions do, so the same sorts of scope behavior 
can happen in a class statement body. Moreover, methods are further nested functions, 
so the same issues apply. Confusion seems to be especially common when classes are 
nested. 


In the following example (the file nester.py), the generate function returns an instance 
of the nested Spam class. Within its code, the class name Spam is assigned in the 
generate function’s local scope. However, in versions of Python prior to 2.2, within the 
class’s method function the class name Spam is not visible—method has access only to its 
own local scope, the module surrounding generate, and built-in names: 


def generate(): # Fails prior to Python 2.2, works later 
class Spam: 
count = 1 
def method(self): # Name Spam not visible: 


print (Spam.count) # not local (def), global (module), built-in 
return Spam() 


generate() .method() 


C:\python\examples> python nester.py 
...error text omitted... 
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Print (Spam. count) # Not local (def), global (module), built-in 
NameError: Spam 


This example works in Python 2.2 and later because the local scopes of all enclosing 
function defs are automatically visible to nested defs (including nested method defs, 
as in this example). However, it doesn’t work before 2.2 (we’ll look at some possible 
solutions momentarily). 


Note that even in 2.2 and later, method defs cannot see the local scope of the enclosing 
class; they can only see the local scopes of enclosing defs. That’s why methods must 
go through the self instance or the class name to reference methods and other attributes 
defined in the enclosing class statement. For example, code in the method must use 
self.count or Spam. count, not just count. 


If you’re using a release prior to 2.2, there are a variety of ways to get the preceding 
example to work. One of the simplest is to move the name Spam out to the enclosing 
module’s scope with a global declaration. Because method sees global names in the 
enclosing module, references to Spam will work: 
def generate(): 
global Spam # Force Spam to module scope 
class Spam: 
count = 1 
def method(self): 


print(Spam.count) | # Works: in global (enclosing module) 
return Spam() 


generate() .method() # Prints 1 


A better alternative would be to restructure the code such that the class Spam is defined 
at the top level of the module by virtue of its nesting level, rather than using global 
declarations. The nested method function and the top-level generate will then find 
Spam in their global scopes: 


def generate(): 
return Spam() 


class Spam: # Define at top level of module 
count = 1 
def method(self): 
print (Spam.count) # Works: in global (enclosing module) 


generate() .method() 


In fact, this approach is recommended for all Python releases—code tends to be simpler 
in general if you avoid nesting classes and functions. 


If you want to get complicated and tricky, you can also get rid of the Spam reference in 
method altogether by using the special __class__ attribute, which returns an instance’s 
class object: 


def generate(): 
class Spam: 
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count = 1 
def method(self): 
print(self. class _.count) # Works: qualify to get class 
return Spam() 


generate() .method() 


Delegation-Based Classes in 3.0: __getattr__ and built-ins 


We met this issue briefly in our class tutorial in Chapter 27 and our delegation coverage 
in Chapter 30: classes that usethe__getattr__ operator overloading method to delegate 
attribute fetches to wrapped objects will fail in Python 3.0 unless operator overloading 
methods are redefined in the wrapper class. In Python 3.0 (and 2.6, when new-style 
classes are used), the names of operator overloading methods implicitly fetched by 
built-in operations are not routed through generic attribute-interception methods. The 
__str__ method used by printing, for example, never invokes __getattr_. Instead, 
Python 3.0 looks up such names in classes and skips the normal runtime instance 
lookup mechanism entirely. To work around this, such methods must be redefined in 
wrapper classes, either by hand, with tools, or by definition in superclasses. We’ll revisit 
this gotcha in Chapters 37 and 38. 


“Overwrapping-itis” 


When used well, the code reuse features of OOP make it excel at cutting development 
time. Sometimes, though, OOP’s abstraction potential can be abused to the point of 
making code difficult to understand. If classes are layered too deeply, code can become 
obscure; you may have to search through many classes to discover what an operation 
does. 


For example, I once worked in a C++ shop with thousands of classes (Some machine- 
generated), and up to 15 levels of inheritance. Deciphering method calls in such a 
complex system was often a monumental task: multiple classes had to be consulted for 
even the most basic of operations. In fact, the logic of the system was so deeply wrapped 
that understanding a piece of code in some cases required days of wading through 
related files. 


The most general rule of thumb of Python programming applies here, too: don’t make 
things complicated unless they truly must be. Wrapping your code in multiple layers 
of classes to the point of incomprehensibility is always a bad idea. Abstraction is the 
basis of polymorphism and encapsulation, and it can be a very effective tool when used 
well. However, you’ll simplify debugging and aid maintainability if you make your class 
interfaces intuitive, avoid making your code overly abstract, and keep your class hier- 
archies short and flat unless there is a good reason to do otherwise. 
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Chapter Summary 


This chapter presented a handful of advanced class-related topics, including subclass- 
ing built-in types, new-style classes, static methods, and decorators. Most of these are 
optional extensions to the OOP model in Python, but they may become more useful 
as you start writing larger object-oriented programs. As mentioned earlier, our discus- 
sion of some of the more advanced class tools continues in the final part of this book; 
be sure to look ahead if you need more details on properties, descriptors, decorators, 
and metaclasses. 


This is the end of the class part of this book, so you'll find the usual lab exercises at the 
end of the chapter—be sure to work through them to get some practice coding real 
classes. In the next chapter, we’ll begin our look at our last core language topic, ex- 
ceptions. Exceptions are Python’s mechanism for communicating errors and other 
conditions to your code. This is a relatively lightweight topic, but I’ve saved it for last 
because exceptions are supposed to be coded as classes today. Before we tackle that 
final core subject, though, take a look at this chapter’s quiz and the lab exercises. 


Test Your Knowledge: Quiz 


. Name two ways to extend a built-in object type. 
. What are function decorators used for? 

. How do you code a new-style class? 

. How are new-style and classic classes different? 


. How are normal and static methods different? 


Na BR WN 


. How long should you wait before lobbing a “Holy Hand Grenade”? 


Test Your Knowledge: Answers 


1. You can embed a built-in object in a wrapper class, or subclass the built-in type 
directly. The latter approach tends to be simpler, as most original behavior is au- 
tomatically inherited. 


2. Function decorators are generally used to add to an existing function a layer of 
logic that is run each time the function is called. They can be used to log or count 
calls to a function, check its argument types, and so on. They are also used to 
“declare” static methods—simple functions in a class that are not passed an in- 
stance when called. 
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3. New-style classes are coded by inheriting from the object built-in class (or any 
other built-in type). In Python 3.0, all classes are new-style automatically, so this 
derivation is not required; in 2.6, classes with this derivation are new-style and 
those without it are “classic.” 


4. New-style classes search the diamond pattern of multiple inheritance trees differ- 
ently—they essentially search breadth-first (across), instead of depth-first (up). 
New-style classes also change the result of the type built-in for instances and 
classes, do not run generic attribute fetch methods such as _ getattr__ for built- 
in operation methods, and support a set of advanced extra tools including prop- 
erties, descriptors, and slots instance attribute lists. 


5. Normal (instance) methods receive a self argument (the implied instance), but 
static methods do not. Static methods are simple functions nested in class objects. 
To make a method static, it must either be run through a special built-in function 
or be decorated with decorator syntax. Python 3.0 allows simple functions in a 
class to be called through the class without this step, but calls through instances 
still require static method declaration. 


6. Three seconds. (Or, more accurately: “And the Lord spake, saying, ‘First shalt thou 
take out the Holy Pin. Then, shalt thou count to three, no more, no less. Three 
shalt be the number thou shalt count, and the number of the counting shall be 
three. Four shalt thou not count, nor either count thou two, excepting that thou 
then proceed to three. Five is right out. Once the number three, being the third 
number, be reached, then lobbest thou thy Holy Hand Grenade of Antioch towards 
thy foe, who, being naughty in my sight, shall snuff it.’”)” 


Test Your Knowledge: Part VI Exercises 


These exercises ask you to write a few classes and experiment with some existing code. 
Of course, the problem with existing code is that it must be existing. To work with the 
set class in exercise 5, either pull the class source code off this book’s website (see the 
Preface for a pointer) or type it up by hand (it’s fairly brief). These programs are starting 
to get more sophisticated, so be sure to check the solutions at the end of the book for 
pointers. You'll find them in Appendix B, under “Part VI, Classes and 
OOP” on page 1122. 


1. Inheritance. Write a class called Adder that exports a method add(self, x, y) that 
prints a “Not Implemented” message. Then, define two subclasses of Adder that 
implement the add method: 

ListAdder 
With an add method that returns the concatenation of its two list arguments 


* This quote is from Monty Python and the Holy Grail. 
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DictAdder 
With an add method that returns a new dictionary containing the items in both 
its two dictionary arguments (any definition of addition will do) 


Experiment by making instances of all three of your classes interactively and calling 
their add methods. 


Now, extend your Adder superclass to save an object in the instance with a con- 
structor (e.g., assign self.data a list or a dictionary), and overload the + operator 
with an _add__ method to automatically dispatch to your add methods (e.g., X + 
Y triggers X.add(X.data,Y)). Where is the best place to put the constructors and 
operator overloading methods (i.e., in which classes)? What sorts of objects can 
you add to your class instances? 


In practice, you might find it easier to code your add methods to accept just one 
real argument (e.g., add(self,y)), and add that one argument to the instance’s 
current data (e.g., self.data + y). Does this make more sense than passing two 
arguments to add? Would you say this makes your classes more “object-oriented”? 


. Operator overloading. Write a class called Mylist that shadows (“wraps”) a Python 
list: it should overload most list operators and operations, including +, indexing, 
iteration, slicing, and list methods such as append and sort. See the Python reference 
manual for a list of all possible methods to support. Also, provide a constructor 
for your class that takes an existing list (or a Mylist instance) and copies its com- 
ponents into an instance member. Experiment with your class interactively. Things 
to explore: 


a. Why is copying the initial value important here? 


b. Can you use an empty slice (e.g., start[:]) to copy the initial value if it’s a 
Mylist instance? 


c. Is there a general way to route list method calls to the wrapped list? 
d. Can you add a Mylist and a regular list? How about a list and aMylist instance? 


e. What type of object should operations like + and slicing return? What about 
indexing operations? 

f. Ifyou are working with a more recent Python release (version 2.2 or later), you 
may implement this sort of wrapper class by embedding a real list in a stand- 
alone class, or by extending the built-in list type with a subclass. Which is 
easier, and why? 


. Subclassing. Make a subclass of Mylist from exercise 2 called MylistSub, which 
extends Mylist to print a message to stdout before each overloaded operation is 
called and counts the number of calls. MylistSub should inherit basic method be- 
havior from Mylist. Adding a sequence to a MylistSub should print a message, 
increment the counter for + calls, and perform the superclass’s method. Also, in- 
troduce a new method that prints the operation counters to stdout, and experiment 
with your class interactively. Do your counters count calls per instance, or per class 
(for all instances of the class)? How would you program the other option)? 
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(Hint: it depends on which object the count members are assigned to: class mem- 
bers are shared by instances, but self members are per-instance data.) 


4. Metaclass methods. Write a class called Meta with methods that intercept every 
attribute qualification (both fetches and assignments), and print messages listing 
their arguments to stdout. Create a Meta instance, and experiment with qualifying 
it interactively. What happens when you try to use the instance in expressions? Try 
adding, indexing, and slicing the instance of your class. (Note: a fully generic ap- 
proach based upon _ getattr_ will work in 2.6 but not 3.0, for reasons noted in 
Chapter 30 and restated in the solution to this exercise.) 


5. Set objects. Experiment with the set class described in “Extending Types by Em- 
bedding” on page 774. Run commands to do the following sorts of operations: 


a. Create two sets of integers, and compute their intersection and union by using 
& and | operator expressions. 


b. Create a set from a string, and experiment with indexing your set. Which 
methods in the class are called? 


c. Try iterating through the items in your string set using a for loop. Which 
methods run this time? 


d. Try computing the intersection and union of your string set and a simple Py- 
thon string. Does it work? 


e. Now, extend your set by subclassing to handle arbitrarily many operands using 
the *args argument form. (Hint: see the function versions of these algorithms 
in Chapter 18.) Compute intersections and unions of multiple operands with 
your set subclass. How can you intersect three or more sets, given that & has 
only two sides? 


f. How would you go about emulating other list operations in the set class? (Hint: 
_add__ can catch concatenation, and _ getattr__ can pass most list method 
calls to the wrapped list.) 


6. Class tree links. In “Namespaces: The Whole Story” on page 693 in Chapter 28 
and in “Multiple Inheritance: “Mix-in” Classes” on page 756 in Chapter 30, I 
mentioned that classes have a __bases__ attribute that returns a tuple of their su- 
perclass objects (the ones listed in parentheses in the class header). Use 
__bases__ to extend the lister.py mix-in classes we wrote in Chapter 30 so that they 
print the names of the immediate superclasses of the instance’s class. When you're 
done, the first line of the string representation should look like this (your address 
may vary): 

<Instance of Sub(Super, Lister), address 7841200: 
7. Composition. Simulate a fast-food ordering scenario by defining four classes: 


Lunch 
A container and controller class 
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Customer 
The actor who buys food 


Employee 
The actor from whom a customer orders 


Food 
What the customer buys 


To get you started, here are the classes and methods you'll be defining: 


class Lunch: 


def _ init__(self) # Make/embed Customer and Employee 

def order(self, foodName) # Start a Customer order simulation 

def result(self) # Ask the Customer what Food it has 
class Customer: 

def _ init__(self) # Initialize my food to None 

def placeOrder(self, foodName, employee) # Place order with an Employee 

def printFood(self) # Print the name of my food 


class Employee: 
def takeOrder(self, foodName) # Return a Food, with requested name 


class Food: 
def init__(self, name) # Store food name 


The order simulation should work as follows: 


a. The Lunch class’s constructor should make and embed an instance of 
Customer and an instance of Employee, and it should export a method called 
order. When called, this order method should ask the Customer to place an 
order by calling its placeOrder method. The Customer’s placeOrder method 
should in turn ask the Employee object for a new Food object by calling 
Employee’s takeOrder method. 


b. Food objects should store a food name string (e.g., “burritos”), passed down 
from Lunch. order, to Customer.placeOrder, to Employee. takeOrder, and finally 
to Food’s constructor. The top-level Lunch class should also export a method 
called result, which asks the customer to print the name of the food it received 
from the Employee via the order (this can be used to test your simulation). 


Note that Lunch needs to pass either the Employee or itself to the Customer to allow 
the Customer to call Employee methods. 


Experiment with your classes interactively by importing the Lunch class, calling its 
order method to run an interaction, and then calling its result method to verify 
that the Customer got what he or she ordered. If you prefer, you can also simply 
code test cases as self-test code in the file where your classes are defined, using the 
module _name_ trick of Chapter 24. In this simulation, the Customer is the active 
agent; how would your classes change if Employee were the object that initiated 
customer/employee interaction instead? 
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Animal 


Figure 31-1. A zoo hierarchy composed of classes linked into a tree to be searched by attribute 
inheritance. Animal has a common “reply” method, but each class may have its own custom “speak” 
method called by “reply”. 


3. Zoo animal hierarchy. Consider the class tree shown in Figure 31-1. 


Code a set of six class statements to model this taxonomy with Python inheritance. 
Then, add a speak method to each of your classes that prints a unique message, 
and a reply method in your top-level Animal superclass that simply calls 
self. speak to invoke the category-specific message printer in a subclass below (this 
will kick off an independent inheritance search from self). Finally, remove the 
speak method from your Hacker class so that it picks up the default above it. When 
you're finished, your classes should work this way: 
% python 


>>> from zoo import Cat, Hacker 
>>> spot = Cat() 


>>> spot.reply() # Animal.reply; calls Cat.speak 
meow 
>>> data = Hacker() # Animal.reply; calls Primate.speak 


>>> data.reply() 

Hello world! 

. The Dead Parrot Sketch. Consider the object embedding structure captured in 
Figure 31-2. 


Code a set of Python classes to implement this structure with composition. Code 
your Scene object to define an action method, and embed instances of the Customer, 
Clerk, and Parrot classes (each of which should define a line method that prints 
a unique message). The embedded objects may either inherit from a common su- 
perclass that defines line and simply provide message text, or define line them- 
selves. In the end, your classes should operate like this: 

% python 

>>> import parrot 


>>> parrot.Scene().action() # Activate nested objects 
customer: "that's one ex-bird!" 
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clerk: "no it isn't..." 
parrot: None 


action ——>: 


| 


Figure 31-2. A scene composite with a controller class (Scene) that embeds and directs instances of 
three other classes (Customer, Clerk, Parrot). The embedded instance’s classes may also participate 
in an inheritance hierarchy; composition and inheritance are often equally useful ways to structure 
classes for code reuse. 


Why You Will Care: OOP by the Masters 


When I teach Python classes, I invariably find that about halfway through the class, 
people who have used OOP in the past are following along intensely, while people who 
have not are beginning to glaze over (or nod off completely). The point behind the 
technology just isn’t apparent. 


In a book like this, I have the luxury of including material like the new Big Picture 
overview in Chapter 25, and the gradual tutorial of Chapter 27—in fact, you should 
probably review that section if you’re starting to feel like OOP is just some computer 
science mumbo-jumbo. 


In real classes, however, to help get the newcomers on board (and keep them awake), 
I have been known to stop and ask the experts in the audience why they use OOP. The 
answers they’ve given might help shed some light on the purpose of OOP, if you’re new 
to the subject. 


Here, then, with only a few embellishments, are the most common reasons to use OOP, 
as cited by my students over the years: 


Code reuse 
This one’s easy (and is the main reason for using OOP). By supporting inheritance, 
classes allow you to program by customization instead of starting each project from 
scratch. 


Encapsulation 
Wrapping up implementation details behind object interfaces insulates users of a 
class from code changes. 


Structure 
Classes provide new local scopes, which minimizes name clashes. They also pro- 
vide a natural place to write and look for implementation code, and to manage 
object state. 
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Maintenance 
Classes naturally promote code factoring, which allows us to minimize redun- 
dancy. Thanks both to the structure and code reuse support of classes, usually only 
one copy of the code needs to be changed. 


Consistency 
Classes and inheritance allow you to implement common interfaces, and hence 
create a common look and feel in your code; this eases debugging, comprehension, 
and maintenance. 


Polymorphism 
This is more a property of OOP than a reason for using it, but by supporting code 
generality, polymorphism makes code more flexible and widely applicable, and 
hence more reusable. 


Other 
And, of course, the number one reason students gave for using OOP: it looks good 
on a résumé! (OK, I threw this one in as a joke, but it is important to be familiar 
with OOP if you plan to work in the software field today.) 


Finally, keep in mind what I said at the beginning of this part of the book: you won’t 
fully appreciate OOP until you’ve used it for awhile. Pick a project, study larger exam- 
ples, work through the exercises—do whatever it takes to get your feet wet with OO 
code; it’s worth the effort. 
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PART VII 


Exceptions and Tools 


CHAPTER 32 
Exception Basics 


This part of the book deals with exceptions, which are events that can modify the flow 
of control through a program. In Python, exceptions are triggered automatically on 
errors, and they can be triggered and intercepted by your code. They are processed by 
four statements we’ll study in this part, the first of which has two variations (listed 
separately here) and the last of which was an optional extension until Python 2.6 and 
3.0: 


try/except 
Catch and recover from exceptions raised by Python, or by you. 
try/finally 
Perform cleanup actions, whether exceptions occur or not. 
raise 
Trigger an exception manually in your code. 
assert 
Conditionally trigger an exception in your code. 
with/as 
Implement context managers in Python 2.6 and 3.0 (optional in 2.5). 


This topic was saved until nearly the end of the book because you need to know about 
classes to code exceptions of your own. With a few exceptions (pun intended), though, 
you'll find that exception handling is simple in Python because it’s integrated into the 
language itself as another high-level tool. 


Why Use Exceptions? 


In anutshell, exceptions let us jump out of arbitrarily large chunks of a program. Con- 
sider the hypothetical pizza-making robot we discussed earlier in the book. Suppose 
we took the idea seriously and actually built such a machine. To make a pizza, our 
culinary automaton would need to execute a plan, which we would implement as a 
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Python program: it would take an order, prepare the dough, add toppings, bake the 
pie, and so on. 


Now, suppose that something goes very wrong during the “bake the pie” step. Perhaps 
the oven is broken, or perhaps our robot miscalculates its reach and spontaneously 
combusts. Clearly, we want to be able to jump to code that handles such states quickly. 
As we have no hope of finishing the pizza task in such unusual cases, we might as well 
abandon the entire plan. 


That’s exactly what exceptions let you do: you can jump to an exception handler in a 
single step, abandoning all function calls begun since the exception handler was en- 
tered. Code in the exception handler can then respond to the raised exception as ap- 
propriate (by calling the fire department, for instance!). 


One way to think of an exception is as a sort of structured “super go to.” An exception 
handler (try statement) leaves a marker and executes some code. Somewhere further 
ahead in the program, an exception is raised that makes Python jump back to that 
marker, abandoning any active functions that were called after the marker was left. 
This protocol provides a coherent way to respond to unusual events. Moreover, because 
Python jumps to the handler statement immediately, your code is simpler—there is 
usually no need to check status codes after every call to a function that could possibly 
fail. 


Exception Roles 


In Python programs, exceptions are typically used for a variety of purposes. Here are 
some of their most common roles: 


Error handling 

Python raises exceptions whenever it detects errors in programs at runtime. You 
can catch and respond to the errors in your code, or ignore the exceptions that are 
raised. If an error is ignored, Python’s default exception-handling behavior kicks 
in: it stops the program and prints an error message. If you don’t want this default 
behavior, code a try statement to catch and recover from the exception—Python 
will jump to your try handler when the error is detected, and your program will 
resume execution after the try. 


Event notification 
Exceptions can also be used to signal valid conditions without you having to pass 
result flags around a program or test them explicitly. For instance, a search routine 
might raise an exception on failure, rather than returning an integer result code 
(and hoping that the code will never be a valid result). 


Special-case handling 
Sometimes a condition may occur so rarely that it’s hard to justify convoluting your 
code to handle it. You can often eliminate special-case code by handling unusual 
cases in exception handlers in higher levels of your program. 
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Termination actions 
As you'll see, the try/finally statement allows you to guarantee that required 
closing-time operations will be performed, regardless of the presence or absence 
of exceptions in your programs. 


Unusual control flows 
Finally, because exceptions are a sort of high-level “go to,” you can use them as 
the basis for implementing exotic control flows. For instance, although the lan- 
guage does not explicitly support backtracking, it can be implemented in Python 
by using exceptions and a bit of support logic to unwind assignments.’ There is no 
“go to” statement in Python (thankfully!), but exceptions can sometimes serve 
similar roles. 


We'll see such typical use cases in action later in this part of the book. For now, let’s 
get started with a look at Python’s exception-processing tools. 


Exceptions: The Short Story 


Compared to some other core language topics we’ve met in this book, exceptions are 
a fairly lightweight tool in Python. Because they are so simple, let’s jump right into 
some code. 


Default Exception Handler 


Suppose we write the following function: 


>>> def fetcher(obj, index): 
return obj[index] 


There’s not much to this function—it simply indexes an object on a passed-in index. 
In normal operation, it returns the result of a legal index: 


>>> x = ‘spam’ 
>>> fetcher(x, 3) # Like x[3] 


m 


However, if we ask this function to index off the end of the string, an exception will be 
triggered when the function tries to run obj[ index]. Python detects out-of-bounds in- 
dexing for sequences and reports it by raising (triggering) the built-in IndexError 
exception: 


* True backtracking is an advanced topic that is not part of the Python language, so I won’t say much more 
about it here (even the generator functions and expressions we met in Chapter 20 are not true backtracking— 
they simply respond to next(G) requests). Roughly, backtracking undoes all computations before it jumps; 
Python exceptions do not (i.e., variables assigned between the time a try statement is entered and the time 
an exception is raised are not reset to their prior values). See a book on artificial intelligence or the Prolog or 
Icon programming languages if you're curious. 
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>>> fetcher(x, 4) # Default handler - shell interface 
Traceback (most recent call last): 

File "<stdin>", line 1, in <module> 

File "<stdin>", line 2, in fetcher 
IndexError: string index out of range 


Because our code does not explicitly catch this exception, it filters back up to the top 
level of the program and invokes the default exception handler, which simply prints the 
standard error message. By this point in the book, you’ve probably seen your share of 
standard error messages. They include the exception that was raised, along with a stack 
trace—a list of all the lines and functions that were active when the exception occurred. 


The error message text here was printed by Python 3.0; it can vary slightly per release, 
and even per interactive shell. When coding interactively in the basic shell interface, 
the filename is just “<stdin>,” meaning the standard input stream. When working in 
the IDLE GUI’s interactive shell, the filename is “<pyshell>”, and source lines are dis- 
played, too. Either way, file line numbers are not very meaningful when there is no file 
(we’ll see more interesting error messages later in this part of the book): 
>>> fetcher(x, 4) # Default handler - IDLE GUI interface 
Traceback (most recent call last): 
File "<pyshell#6>", line 1, in <module> 
fetcher(x, 4) 
File "<pyshell#3>", line 2, in fetcher 


return obj[index] 
IndexError: string index out of range 


In a more realistic program launched outside the interactive prompt, after printing an 
error message the default handler at the top also terminates the program immediately. 
That course of action makes sense for simple scripts; errors often should be fatal, and 
the best you can do when they occur is inspect the standard error message. 


Catching Exceptions 


Sometimes, this isn’t what you want, though. Server programs, for instance, typically 
need to remain active even after internal errors. If you don’t want the default exception 
behavior, wrap the call in a try statement to catch exceptions yourself: 
>>> try: 
fetcher(x, 4) 


. except IndexError: # Catch and recover 
print('got exception' ) 


got exception 

>>> 
Now, Python jumps to your handler (the block under the except clause that names the 
exception raised) automatically when an exception is triggered while the try block is 
running. When working interactively like this, after the except clause runs, we wind 
up back at the Python prompt. In a more realistic program, try statements not only 
catch exceptions, but also recover from them: 
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>>> def catcher(): 
try: 
fetcher(x, 4) 
except IndexError: 
print('got exception’) 
print('continuing') 


>>> catcher() 
got exception 
continuing 
>>> 


This time, after the exception is caught and handled, the program resumes execution 
after the entire try statement that caught it—which is why we get the “continuing” 
message here. We don’t see the standard error message, and the program continues on 
its way normally. 


Raising Exceptions 


So far, we’ve been letting Python raise exceptions for us by making mistakes (on pur- 
pose this time!), but our scripts can raise exceptions too—that is, exceptions can be 
raised by Python or by your program, and can be caught or not. To trigger an exception 
manually, simply run a raise statement. User-triggered exceptions are caught the same 
way as those Python raises. The following may not be the most useful Python code ever 
penned, but it makes the point: 
>>> try: 
raise IndexError # Trigger exception manually 


. except IndexError: 
print('got exception’) 


got exception 


As usual, if they’re not caught, user-triggered exceptions are propagated up to the top- 
level default exception handler and terminate the program with a standard error 
message: 

>>> raise IndexError 

Traceback (most recent call last): 


File "<stdin>", line 1, in <module> 
IndexError 


As we'll see in the next chapter, the assert statement can be used to trigger exceptions, 
too—it’s a conditional raise, used mostly for debugging purposes during development: 


>>> assert False, ‘Nobody expects the Spanish Inquisition! ' 
Traceback (most recent call last): 

File "<stdin>", line 1, in <module> 
AssertionError: Nobody expects the Spanish Inquisition! 
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User-Defined Exceptions 


The raise statement introduced in the prior section raises a built-in exception defined 
in Python’s built-in scope. As you'll learn later in this part of the book, you can also 
define new exceptions of your own that are specific to your programs. User-defined 
exceptions are coded with classes, which inherit from a built-in exception class: usually 
the class named Exception. Class-based exceptions allow scripts to build exception 
categories, inherit behavior, and have attached state information: 


>>> class Bad(Exception): # User-defined exception 
pass 


>>> def doomed(): 


raise Bad() # Raise an instance 
>>> try: 
doomed() 
. except Bad: # Catch class name 


print('got Bad') 
got Bad 
>>> 


Termination Actions 


Finally, try statements can say “finally”—that is, they may include finally blocks. 
These look like except handlers for exceptions, but the try/finally combination speci- 
fies termination actions that always execute “on the way out,” regardless of whether 
an exception occurs in the try block: 
>>> try: 
fetcher(x, 3) 


... finally: # Termination actions 
print('after fetch') 


m 
after fetch 
>>> 


Here, if the try block finishes without an exception, the finally block will run, and 
the program will resume after the entire try. In this case, this statement seems a bit 
silly—we might as well have simply typed the print right after a call to the function, 
and skipped the try altogether: 


fetcher(x, 3) 
print('after fetch’) 


There is a problem with coding this way, though: if the function call raises an exception, 
the print will never be reached. The try/finally combination avoids this pitfall—when 
an exception does occur in a try block, finally blocks are executed while the program 
is being unwound: 
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>>> def after(): 
sake try: 
fetcher(x, 4) 
finally: 
print('after fetch’) 
print('after try?') 


>>> after() 

after fetch 

Traceback (most recent call last): 
File "<stdin>", line 1, in <module> 
File "<stdin>", line 3, in after 
File "<stdin>", line 2, in fetcher 

IndexError: string index out of range 

>>> 


Here, we don’t get the “after try?” message because control does not resume after the 
try/finally block when an exception occurs. Instead, Python jumps back to run the 
finally action, and then propagates the exception up to a prior handler (in this case, 
to the default handler at the top). If we change the call inside this function so as not to 
trigger an exception, the finally code still runs, but the program continues after the try: 
>>> def after(): 
Sau try: 
fetcher(x, 3) 
finally: 
print('after fetch') 
print('after try?') 


>>> after() 
after fetch 
after try? 
>>> 


In practice, try/except combinations are useful for catching and recovering from ex- 
ceptions, and try/finally combinations come in handy to guarantee that termination 
actions will fire regardless of any exceptions that may occur in the try block’s code. 
For instance, you might use try/except to catch errors raised by code that you import 
from a third-party library, and try/finally to ensure that calls to close files or terminate 
server connections are always run. We’ll see some such practical examples later in this 
part of the book. 


Although they serve conceptually distinct purposes, as of Python 2.5, we can now mix 
except and finally clauses in the same try statement—the finally is run on the way 
out regardless of whether an exception was raised, and regardless of whether the ex- 
ception was caught by an except clause. 


As we'll learn in the next chapter, Python 2.6 and 3.0 provide an alternative to try/ 
finally when using some types of objects. The with/as statement runs an object’s con- 
text management logic to guarantee that termination actions occur: 


Exceptions: The Short Story | 831 


>>> with open('lumberjack.txt', 'w') as file: # Always close file on exit 
file.write('The larch!\n') 


Although this option requires fewer lines of code, it’s only applicable when processing 
certain object types, so try/finally is a more general termination structure. On the 
other hand, with/as may also run startup actions and supports user-defined context 


management code. 


Why You Will Care: Error Checks 


One way to see how exceptions are useful is to compare coding styles in Python and 
languages without exceptions. For instance, if you want to write robust programs in 
the C language, you generally have to test return values or status codes after every 
operation that could possibly go astray, and propagate the results of the tests as your 
programs run: 


doStuff() 
# C program 
if (doFirstThing() == ERROR) # Detect errors everywhere 
return ERROR; # even if not handled here 


if (doNextThing() == ERROR) 
return ERROR; 


return doLastThing(); 


} 
main() 
if (doStuff() == ERROR) 
badEnding(); 
else 
goodEnding(); 
} 


In fact, realistic C programs often have as much code devoted to error detection as to 
doing actual work. But in Python, you don’t have to be so methodical (and neurotic!). 
You can instead wrap arbitrarily vast pieces of a program in exception handlers and 
simply write the parts that do the actual work, assuming all is well: 


def doStuff(): # Python code 
doFirstThing() # We don't care about exceptions here, 
doNextThing() # so we don't need to detect them 
doLastThing() 
if _name__ == '_ main_': 
try: 
doStuff() # This is where we care about results, 
except: # so it's the only place we must check 
badEnding() 
else: 
goodEnding() 
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Because control jumps immediately to a handler when an exception occurs, there’s no 
need to instrument all your code to guard for errors. Moreover, because Python detects 
errors automatically, your code usually doesn’t need to check for errors in the first 
place. The upshot is that exceptions let you largely ignore the unusual cases and avoid 
error-checking code. 


Chapter Summary 


And that is the majority of the exception story; exceptions really are a simple tool. 


To summarize, Python exceptions are a high-level control flow device. They may be 
raised by Python, or by your own programs. In both cases, they may be ignored (to 
trigger the default error message), or caught by try statements (to be processed by your 
code). The try statement comes in two logical formats that, as of Python 2.5, can be 
combined—one that handles exceptions, and one that executes finalization code re- 
gardless of whether exceptions occur or not. Python’s raise and assert statements 
trigger exceptions on demand (both built-ins and new exceptions we define with 
classes); the with/as statement is an alternative way to ensure that termination actions 
are carried out for objects that support it. 


In the rest of this part of the book, we’ll fill in some of the details about the statements 
involved, examine the other sorts of clauses that can appear under a try, and discuss 
class-based exception objects. The next chapter begins our tour by taking a closer look 
at the statements we introduced here. Before you turn the page, though, here are a few 
quiz questions to review. 


Test Your Knowledge: Quiz 


. Name three things that exception processing is good for. 
. What happens to an exception if you don’t do anything special to handle it? 
. How can your script recover from an exception? 


. Name two ways to trigger exceptions in your script. 


nA BW NH 


. Name two ways to specify actions to be run at termination time, whether an ex- 
ception occurs or not. 


Test Your Knowledge: Answers 


1. Exception processing is useful for error handling, termination actions, and event 
notification. It can also simplify the handling of special cases and can be used to 
implement alternative control flows. In general, exception processing also cuts 
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down on the amount of error-checking code your program may require—because 
all errors filter up to handlers, you may not need to test the outcome of every 
operation. 


2. Any uncaught exception eventually filters up to the default exception handler Py- 
thon provides at the top of your program. This handler prints the familiar error 
message and shuts down your program. 


3. If you don’t want the default message and shutdown, you can code try/except 
statements to catch and recover from exceptions that are raised. Once an exception 
is caught, the exception is terminated and your program continues. 


4. The raise and assert statements can be used to trigger an exception, exactly as if 
it had been raised by Python itself. In principle, you can also raise an exception by 
making a programming mistake, but that’s not usually an explicit goal! 


5. The try/finally statement can be used to ensure actions are run after a block of 
code exits, regardless of whether it raises an exception or not. The with/as state- 
ment can also be used to ensure termination actions are run, but only when pro- 
cessing object types that support it. 
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CHAPTER 33 
Exception Coding Details 


In the prior chapter we took a quick look at exception-related statements in action. 
Here, we’re going to dig a bit deeper—this chapter provides a more formal introduction 
to exception processing syntax in Python. Specifically, we’ll explore the details behind 
the try, raise, assert, and with statements. As we’ll see, although these statements are 
mostly straightforward, they offer powerful tools for dealing with exceptions in Python 
code. 


Va, 
sS One procedural note up front: The exception story has changed in major 
TS ways in recent years. As of Python 2.5, the finally clause can appear in 
ei the same try statement as except and else clauses (previously, they 


` could not be combined). Also, as of Python 3.0 and 2.6, the new with 
context manager statement has become official, and user-defined ex- 
ceptions must now be coded as class instances, which should inherit 
from a built-in exception superclass. Moreover, 3.0 sports slightly modi- 
fied syntax for the raise statement and except clauses. I will focus on 
the state of exceptions in Python 2.6 and 3.0 in this edition, but because 
you are still very likely to see the original techniques in code for some 
time to come, along the way Pll point out how things have evolved in 
this domain. 


The try/except/else Statement 


Now that we’ve seen the basics, it’s time for the details. In the following discussion, 
Pll first present try/except/else and try/finally as separate statements, because in 
versions of Python prior to 2.5 they serve distinct roles and cannot be combined. As 
mentioned in the preceding note, in Python 2.5 and later except and finally can be 
mixed in a single try statement; Pll explain the implications of this change after we’ve 
explored the two original forms in isolation. 


The try is a compound statement; its most complete form is sketched below. It starts 
with a try header line, followed by a block of (usually) indented statements, then one 
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or more except clauses that identify exceptions to be caught, and an optional else clause 
at the end. The words try, except, and else are associated by indenting them to the 
same level (i.e., lining them up vertically). For reference, here’s the general format in 
Python 3.0: 


try: 

<statements> # Run this main action first 
except <name1>: 

<statements> # Run if name1 is raised during try block 
except (name2, name3): 

<statements> # Run if any of these exceptions occur 
except <name4> as <data>: 

<statements> # Run if name4 is raised, and get instance raised 
except: 

<statements> # Run for all (other) exceptions raised 
else: 

<statements> # Run if no exception was raised during try block 


In this statement, the block under the try header represents the main action of the 
statement—the code you're trying to run. The except clauses define handlers for ex- 
ceptions raised during the try block, and the else clause (if coded) provides a handler 
to be run if no exceptions occur. The <data> entry here has to do with a feature of 
raise statements and exception classes, which we will discuss later in this chapter. 


Here’s how try statements work. When a try statement is entered, Python marks the 
current program context so it can return to it if an exception occurs. The statements 
nested under the try header are run first. What happens next depends on whether 
exceptions are raised while the try block’s statements are running: 


e If an exception does occur while the try block’s statements are running, Python 
jumps back to the try and runs the statements under the first except clause that 
matches the raised exception. Control resumes below the entire try statement after 
the except block runs (unless the except block raises another exception). 


e Ifan exception happens in the try block and no except clause matches, the excep- 
tion is propagated up to the last matching try statement that was entered in the 
program or, if it’s the first such statement, to the top level of the process (in which 
case Python kills the program and prints a default error message). 


e Ifno exception occurs while the statements under the try header run, Python runs 
the statements under the else line (if present), and control then resumes below the 
entire try statement. 


In other words, except clauses catch any exceptions that happen while the try block is 
running, and the else clause runs only if no exceptions happen while the try block runs. 


except clauses are focused exception handlers—they catch exceptions that occur only 
within the statements in the associated try block. However, as the try block’s state- 
ments can call functions coded elsewhere in a program, the source of an exception may 
be outside the try statement itself. Pll have more to say about this when we explore 
try nesting in Chapter 35. 
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try Statement Clauses 


When you write a try statement, a variety of clauses can appear after the try header. 
Table 33-1 summarizes all the possible forms—you must use at least one. We’ve already 
met some of these: as you know, except clauses catch exceptions, finally clauses run 
on the way out, and else clauses run if no exceptions are encountered. 


Syntactically, there may be any number of except clauses, but you can code else only 
if there is at least one except, and there can be only one else and one finally. Through 
Python 2.4, the finally clause must appear alone (without else or except); the try/ 
finally is really a different statement. As of Python 2.5, however, a finally can appear 
inthe same statement as except and else (more on the ordering rules later in this chapter 
when we meet the unified try statement). 


Table 33-1. try statement clause forms 


Clause form Interpretation 

except: Catch all (or all other) exception types. 
except name: Catch a specific exception only. 

except name as value: Catch the listed exception and its instance. 
except (name1, name2): Catch any of the listed exceptions. 


except (name1,name2)asvalue: Catch any listed exception and its instance. 
else: Run if no exceptions are raised. 


finally: Always perform this block. 


We'll explore the entries with the extra as value part when we meet the raise statement. 
They provide access to the objects that are raised as exceptions. 


The first and fourth entries in Table 33-1 are new here: 


e except clauses that list no exception name (except:) catch all exceptions not pre- 
viously listed in the try statement. 


e except clauses that list a set of exceptions in parentheses (except (e1, e2, e3):) 
catch any of the listed exceptions. 


Because Python looks for a match within a given try by inspecting the except clauses 
from top to bottom, the parenthesized version has the same effect as listing each ex- 
ception in its own except clause, but you have to code the statement body only once. 
Here’s an example of multiple except clauses at work, which demonstrates just how 
specific your handlers can be: 

try: 

action() 
except NameError: 


except IndexError: 
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except KeyError: 
except (AttributeError, TypeError, SyntaxError): 


else: 


In this example, if an exception is raised while the call to the action function is running, 
Python returns to the try and searches for the first except that names the exception 
raised. It inspects the except clauses from top to bottom and left to right, and runs the 
statements under the first one that matches. If none match, the exception is propagated 
past this try. Note that the else runs only when no exception occurs in action—it does 
not run when an exception without a matching except is raised. 


If you really want a general “catch-all” clause, an empty except does the trick: 


try: 

action() 
except NameError: 

sea # Handle NameError 
except IndexError: 

ses # Handle IndexError 
except: 
sss # Handle all other exceptions 
else: 

# Handle the no-exception case 


The empty except clause is a sort of wildcard feature—because it catches everything, it 
allows your handlers to be as general or specific as you like. In some scenarios, this 
form may be more convenient than listing all possible exceptions in a try. For example, 
the following catches everything without listing anything: 
try: 
action() 


except: 
# Catch all possible exceptions 


Empty excepts also raise some design issues, though. Although convenient, they may 
catch unexpected system exceptions unrelated to your code, and they may inadver- 
tently intercept exceptions meant for another handler. For example, even system exit 
calls in Python trigger exceptions, and you usually want these to pass. That said, this 
structure may also catch genuine programming mistakes for you which you probably 
want to see an error message. We’ll revisit this as a gotcha at the end of this part of the 
book. For now, I’ll just say “use with care.” 


Python 3.0 introduced an alternative that solves one of these problems—catching an 
exception named Exception has almost the same effect as an empty except, but ignores 
exceptions related to system exits: 
try: 
action() 


except Exception: 
# Catch all possible exceptions, except exits 
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This has most of the same convenience of the empty except, but also most of the same 
dangers. We’ll explore how this form works its voodoo in the next chapter, when we 
study exception classes. 


Va 

Version skew note: Python 3.0 requires the except E as V: handler clause 
form listed in Table 33-1 and used in this book, rather than the older 
except E, V: form. The latter form is still available (but not 
recommended) in Python 2.6: if used, it’s converted to the former. The 
change was made to eliminate errors that occur when confusing the 
older form with two alternate exceptions, properly coded in 2.6 as 
except (E1, E2):. Because 3.0 supports the as form only, commas in a 
handler clause are always taken to mean a tuple, regardless of whether 
parentheses are used or not, and the values are interpreted as alternative 
exceptions to be caught. This change also modifies the scoping rules: 
with the new as syntax, the variable V is deleted at the end of the 
except block. 


The try else Clause 


The purpose of the else clause is not always immediately obvious to Python newcom- 
ers. Without it, though, there is no way to tell (without setting and checking Boolean 
flags) whether the flow of control has proceeded past a try statement because no ex- 
ception was raised, or because an exception occurred and was handled: 
try: 
...run code... 
except IndexError: 


..-handle exception... 
# Did we get here because the try failed or not? 


Much like the way else clauses in loops make the exit cause more apparent, the else 
clause provides syntax in a try that makes what has happened obvious and 
unambiguous: 


try: 

... run code... 
except IndexError: 

...handle exception... 
else: 

...no exception occurred... 


You can almost emulate an else clause by moving its code into the try block: 
try: 
...run code... 
...no exception occurred... 


except IndexError: 
..-handle exception... 


This can lead to incorrect exception classifications, though. If the “no exception oc- 
curred” action triggers an IndexError, it will register as a failure of the try block and 
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erroneously trigger the exception handler below the try (subtle, but true!). By using an 
explicit else clause instead, you make the logic more obvious and guarantee that 
except handlers will run only for real failures in the code you’re wrapping in a try, not 
for failures in the else case’s action. 


Example: Default Behavior 


Because the control flow through a program is easier to capture in Python than in 
English, let’s run some examples that further illustrate exception basics. I’ve mentioned 
that exceptions not caught by try statements percolate up to the top level of the Python 
process and run Python’s default exception-handling logic (i.e., Python terminates the 
running program and prints a standard error message). Let’s look at an example. Run- 
ning the following module file, bad.py, generates a divide-by-zero exception: 


def gobad(x, y): 
return x / y 


def gosouth(x): 
print(gobad(x, 0)) 


gosouth(1) 


Because the program ignores the exception it triggers, Python kills the program and 
prints a message: 
% python bad.py 
Traceback (most recent call last): 
File "bad.py", line 7, in <module> 
gosouth(1) 
File "bad.py", line 5, in gosouth 
print(gobad(x, 0)) 
File "bad.py", line 2, in gobad 
return x / y 
ZeroDivisionError: int division or modulo by zero 


I ran this in a shell widow with Python 3.0. The message consists of a stack trace 
(“Traceback”) and the name of and details about the exception that was raised. The 
stack trace lists all lines active when the exception occurred, from oldest to newest. 
Note that because we’re not working at the interactive prompt, in this case the file and 
line number information is more useful. For example, here we can see that the bad 
divide happens at the last entry in the trace—line 2 of the file bad.py, a return 
statement.” 


Because Python detects and reports all errors at runtime by raising exceptions, excep- 
tions are intimately bound up with the ideas of error handling and debugging in general. 


* As mentioned in the prior chapter, the text of error messages and stack traces tends to vary slightly over time 
and shells. Don’t be alarmed if your error messages don’t exactly match mine. When I ran this example in 
Python 3.0’s IDLE GUI, for instance, its error message text showed filenames with full absolute directory 
paths. 
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If you’ve worked through this book’s examples, you’ve undoubtedly seen an exception 
or two along the way—even typos usually generate a SyntaxError or other exception 
when a file is imported or executed (that’s when the compiler is run). By default, you 
get a useful error display like the one just shown, which helps you track down the 
problem. 


Often, this standard error message is all you need to resolve problems in your code. 
For more heavy-duty debugging jobs, you can catch exceptions with try statements, 
or use one of the debugging tools that I introduced in Chapter 3 and will summarize 
again in Chapter 35 (such as the pdb standard library module). 


Example: Catching Built-in Exceptions 


Python’s default exception handling is often exactly what you want—especially for 
code in a top-level script file, an error generally should terminate your program imme- 
diately. For many programs, there is no need to be more specific about errors in your 
code. 


Sometimes, though, you’ll want to catch errors and recover from them instead. If you 
don’t want your program terminated when Python raises an exception, simply catch it 
by wrapping the program logic in a try. This is an important capability for programs 
such as network servers, which must keep running persistently. For example, the fol- 
lowing code catches and recovers from the TypeError Python raises immediately when 
you try to concatenate a list and a string (the + operator expects the same sequence type 
on both sides): 


def kaboom(x, y): 


print(x + y) # Trigger TypeError 

try: 
kaboom([0,1,2], "spam") 

except TypeError: # Catch and recover here 
print('Hello world!') 

print(‘resuming here’ ) # Continue here if exception or not 


When the exception occurs in the function kaboom, control jumps to the try statement’s 
except clause, which prints a message. Since an exception is “dead” after it’s been 
caught like this, the program continues executing below the try rather than being ter- 
minated by Python. In effect, the code processes and clears the error, and your script 
recovers: 

% python kaboom. py 


Hello world! 
resuming here 


Notice that once you’ve caught an error, control resumes at the place where you caught 
it (i.e., after the try); there is no direct way to go back to the place where the exception 
occurred (here, in the function kaboom). In a sense, this makes exceptions more like 
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simple jumps than function calls—there is no way to return to the code that triggered 
the error. 


The try/finally Statement 


The other flavor of the try statement is a specialization that has to do with finalization 
actions. If a finally clause is included in a try, Python will always run its block of 
statements “on the way out” of the try statement, whether an exception occurred while 
the try block was running or not. Its general form is: 


try: 
<statements> # Run this action first 
finally: 
<statements> # Always run this code on the way out 


With this variant, Python begins by running the statement block associated with the 
try header line. What happens next depends on whether an exception occurs during 
the try block: 


e If no exception occurs while the try block is running, Python jumps back to run 
the finally block and then continues execution past below the try statement. 


e Ifan exception does occur during the try block’s run, Python still comes back and 
runs the finally block, but it then propagates the exception up to a higher try or 
the top-level default handler; the program does not resume execution below the 
try statement. That is, the finally block is run even if an exception is raised, but 
unlike an except, the finally does not terminate the exception—it continues being 
raised after the finally block runs. 


The try/finally form is useful when you want to be completely sure that an action will 
happen after some code runs, regardless of the exception behavior of the program. In 
practice, it allows you to specify cleanup actions that always must occur, such as file 
closes and server disconnects. 


Note that the finally clause cannot be used in the same try statement as except and 
else in Python 2.4 and earlier, so the try/finally is best thought of as a distinct state- 
ment form if you are using an older release. In Python 2.5, and later, however, 
finally can appear in the same statement as except and else, so today there is really a 
single try statement with many optional clauses (more about this shortly). Whichever 
version you use, though, the finally clause still serves the same purpose—to specify 
“cleanup” actions that must always be run, regardless of any exceptions. 


LIN 

4 
SS As we’ll also see later in this chapter, in Python 2.6 and 3.0, the new 
43 with statement and its context managers provide an object-based way 
~~ 4S to do similar work for exit actions. Unlike finally, this new statement 


` also supports entry actions, but it is limited in scope to objects that 
implement the context manager protocol. 
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Example: Coding Termination Actions with try/finally 


We saw some simple try/finally examples in the prior chapter. Here’s a more realistic 
example that illustrates a typical role for this statement: 


class MyError(Exception): pass 


def stuff(file): 
raise MyError() 


file = open('data', 'w') # Open an output file 


try: 

stuff (file) # Raises exception 
finally: 

file.close() # Always close file to flush output buffers 
print('not reached’) # Continue here only if no exception 


In this code, we’ve wrapped a call to a file-processing function in a try with a 
finally clause to make sure that the file is always closed, and thus finalized, whether 
the function triggers an exception or not. This way, later code can be sure that the file’s 
output buffer’s content has been flushed from memory to disk. A similar code structure 
can guarantee that server connections are closed, and so on. 


As we learned in Chapter 9, file objects are automatically closed on garbage collection; 
this is especially useful for temporary files that we don’t assign to variables. However, 
it’s not always easy to predict when garbage collection will occur, especially in larger 
programs. The try statement makes file closes more explicit and predictable and per- 
tains to a specific block of code. It ensures that the file will be closed on block exit, 
regardless of whether an exception occurs or not. 


This particular example’s function isn’t all that useful (it just raises an exception), but 
wrapping calls in try/finally statements is a good way to ensure that your closing-time 
(i.e., termination) activities always run. Again, Python always runs the code in your 
finally blocks, regardless of whether an exception happens in the try block.t 


When the function here raises its exception, the control flow jumps back and runs the 
finally block to close the file. The exception is then propagated on to either another 
try or the default top-level handler, which prints the standard error message and shuts 
down the program; the statement after this try is never reached. If the function here 
did not raise an exception, the program would still execute the finally block to close 
the file, but it would then continue below the entire try statement. 


Notice that the user-defined exception here is again defined with a class—as we'll see 
in the next chapter, exceptions today must all be class instances in both 2.6 and 3.0. 


t Unless Python crashes completely, of course. It does a good job of avoiding this, though, by checking all 
possible errors as a program runs. When a program does crash hard, it is usually due to a bug in linked-in C 
extension code, outside of Python’s scope. 
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Unified try/except/finally 


In all versions of Python prior to Release 2.5 (for its first 15 years of life, more or less), 
the try statement came in two flavors and was really two separate statements—we 
could either use a finally to ensure that cleanup code was always run, or write 
except blocks to catch and recover from specific exceptions and optionally specify an 
else clause to be run if no exceptions occurred. 


That is, the finally clause could not be mixed with except and else. This was partly 
because of implementation issues, and partly because the meaning of mixing the two 
seemed obscure—catching and recovering from exceptions seemed a disjoint concept 
from performing cleanup actions. 


In Python 2.5 and later, though (including 2.6 and 3.0, the versions used in this book), 
the two statements have merged. Today, we can mix finally, except, and else clauses 
in the same statement. That is, we can now write a statement of this form: 
try: # Merged form 
main-action 
except Exception1: 
handler1 


except Exception2: 
handler2 


else: 
else-block 
finally: 
finally-block 
The code in this statement’s main-action block is executed first, as usual. If that code 
raises an exception, all the except blocks are tested, one after another, looking for a 
match to the exception raised. If the exception raised is Exception1, the handler1 block 
is executed; if it’s Exception2, handler2 is run, and so on. If no exception is raised, the 
else-block is executed. 


No matter what’s happened previously, the finally-block is executed once the main 
action block is complete and any raised exceptions have been handled. In fact, the code 
in the finally-block will be run even if there is an error in an exception handler or the 
else-block and a new exception is raised. 


As always, the finally clause does not end the exception—if an exception is active 
when the finally-block is executed, it continues to be propagated after the finally- 
block runs, and control jumps somewhere else in the program (to another try, or to 
the default top-level handler). If no exception is active when the finally is run, control 
resumes after the entire try statement. 


The net effect is that the finally is always run, regardless of whether: 


e An exception occurred in the main action and was handled. 


e An exception occurred in the main action and was not handled. 
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e No exceptions occurred in the main action. 


e Anew exception was triggered in one of the handlers. 


Again, the finally serves to specify cleanup actions that must always occur on the way 
out of the try, regardless of what exceptions have been raised or handled. 


Unified try Statement Syntax 


When combined like this, the try statement must have either an except or a finally, 
and the order of its parts must be like this: 


try -> except -> else -> finally 


where the else and finally are optional, and there may be zero or more except, but 
there must be at least one except if an else appears. Really, the try statement consists 
of two parts: excepts with an optional else, and/or the finally. 


In fact, it’s more accurate to describe the merged statement’s syntactic form this way 
(square brackets mean optional and star means zero-or-more here): 
try: # Format 1 
statements 
except [type [as value]]: # [type [, value]] in Python 2 
statements 
[except [type [as value]]: 
statements |* 
[else: 
statements] 
[finally: 
statements] 


try: # Format 2 
statements 

finally: 
statements 


Because of these rules, the else can appear only if there is at least one except, and it’s 
always possible to mix except and finally, regardless of whether an else appears or 
not. It’s also possible to mix finally and else, but only if an except appears too (though 
the except can omit an exception name to catch everything and run a raise statement, 
described later, to reraise the current exception). If you violate any of these ordering 
rules, Python will raise a syntax error exception before your code runs. 


Combining finally and except by Nesting 


Prior to Python 2.5, it is actually possible to combine finally and except clauses in a 
try by syntactically nesting a try/except in the try block of a try/finally statement 
(we'll explore this technique more fully in Chapter 35). In fact, the following has the 
same effect as the new merged form shown at the start of this section: 
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try: # Nested equivalent to merged form 
try: 
main-action 
except Exception1: 
handler1 
except Exception2: 
handler2 


else: 
no-error 
finally: 

cleanup 
Again, the finally block is always run on the way out, regardless of what happened in 
the main action and regardless of any exception handlers run in the nested try (trace 
through the four cases listed previously to see how this works the same). Since an 
else always requires an except, this nested form even sports the same mixing con- 
straints of the unified statement form outlined in the preceding section. 


However, this nested equivalent is more obscure and requires more code than the new 
merged form (one four-character line, at least). Mixing finally into the same statement 
makes your code easier to write and read, so this is the generally preferred technique 
today. 


Unified try Example 


Here’s a demonstration of the merged try statement form at work. The following file, 
mergedexc.py, codes four common scenarios, with print statements that describe the 
meaning of each: 


sep = '-' * 32 + '\n' 
print(sep + 'EXCEPTION RAISED AND CAUGHT’ ) 
try: 

x = 'spam' [99] 


except IndexError: 
print('except run') 
finally: 
print('finally run’) 
print('after run') 


print(sep + 'NO EXCEPTION RAISED') 
try: 

x = 'spam'[3] 
except IndexError: 

print('except run') 
finally: 

print('finally run’) 
print('after run') 


print(sep + 'NO EXCEPTION RAISED, WITH ELSE') 
try: 
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x = 'spam'[3] 
except IndexError: 

print('except run') 
else: 

print(‘else run') 
finally: 

print('finally run’) 
print('after run') 


print(sep + ‘EXCEPTION RAISED BUT NOT CAUGHT') 
try: 
x=1/0 
except IndexError: 
print('except run') 
finally: 
print('finally run’) 
print('after run') 


When this code is run, the following output is produced in Python 3.0 (actually, its 
behavior and output are the same in 2.6, because the print calls each print a single 
item). Trace through the code to see how exception handling produces the output of 
each of the four tests here: 


c:\misc> C:\Python30\python mergedexc.py 
EXCEPTION RAISED AND CAUGHT 
except run 
finally run 
after run 
NO EXCEPTION RAISED 
finally run 
after run 
NO EXCEPTION RAISED, WITH ELSE 
else run 
finally run 
after run 
EXCEPTION RAISED BUT NOT CAUGHT 
finally run 
Traceback (most recent call last): 
File "mergedexc.py", line 36, in <module> 
x=1/0 
ZeroDivisionError: int division or modulo by zero 


This example uses built-in operations in the main action to trigger exceptions (or not), 
and it relies on the fact that Python always checks for errors as code is running. The 
next section shows how to raise exceptions manually instead. 
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The raise Statement 


To trigger exceptions explicitly, you can code raise statements. Their general form is 
simple—a raise statement consists of the word raise, optionally followed by the class 
to be raised or an instance of it: 


raise <instance> # Raise instance of class 
raise <class> # Make and raise instance of class 
raise # Reraise the most recent exception 


As mentioned earlier, exceptions are always instances of classes in Python 2.6 and 3.0. 
Hence, the first raise form here is the most common—we provide an instance directly, 
either created before the raise or within the raise statement itself. If we pass a class 
instead, Python calls the class with no constructor arguments, to create an instance to 
be raised; this form is equivalent to adding parentheses after the class reference. The 
last form reraises the most recently raised exception; it’s commonly used in exception 
handlers to propagate exceptions that have been caught. 


To make this clearer, let’s look at some examples. With built-in exceptions, the fol- 
lowing two forms are equivalent—both raise an instance of the exception class named, 
but the first creates the instance implicitly: 


raise IndexError # Class (instance created) 
raise IndexError() # Instance (created in statement) 


We can also create the instance ahead of time—because the raise statement accepts 
any kind of object reference, the following two examples raise IndexError just like the 
prior two: 


exc = IndexError() # Create instance ahead of time 
raise exc 


excs = [IndexError, TypeError] 
raise excs[0] 


When an exception is raised, Python sends the raised instance along with the exception. 
Ifa try includes an except name as X: clause, the variable X will be assigned the instance 
provided in the raise: 


try: 


except IndexError as X: # X assigned the raised instance object 


The as is optional in a try handler (if it’s omitted, the instance is simply not assigned 
to a name), but including it allows the handler to access both data in the instance and 
methods in the exception class. 


This model works the same for user-defined exceptions we code with classes—the 
following, for example, passes to the exception class constructor arguments that be- 
come available in the handler through the assigned instance: 


848 | Chapter 33: Exception Coding Details 


class MyExc(Exception): pass 


raise MyExc('spam' ) # Exception class with constructor args 
try: 
except MyExc as X: # Instance attributes available in handler 


print(X.args) 


Because this encroaches on the next chapter’s topic, though, PI defer further details 
until then. 


Regardless of how you name them, exceptions are always identified by instance objects, 
and at most one is active at any given time. Once caught by an except clause anywhere 
in the program, an exception dies (i.e., won’t propagate to another try), unless it’s 
reraised by another raise statement or error. 


Propagating Exceptions with raise 


A raise statement that does not include an exception name or extra data value simply 
reraises the current exception. This form is typically used if you need to catch and 
handle an exception but don’t want the exception to die in your code: 
>>> try: 
raise IndexError('spam') # Exceptions remember arguments 
. except IndexError: 
print(' propagating’ ) 
raise # Reraise most recent exception 
propagating 
Traceback (most recent call last): 


File "<stdin>", line 2, in <module> 
IndexError: spam 


Running a raise this way reraises the exception and propagates it to a higher handler 
(or the default handler at the top, which stops the program with a standard error mes- 


sage). Notice how the argument we passed to the exception class shows up in the error 
messages; you'll learn why this happens in the next chapter. 


Python 3.0 Exception Chaining: raise from 


Python 3.0 (but not 2.6) also allows raise statements to have an optional from clause: 


raise exception from otherexception 


When the from is used, the second expression specifies another exception class or in- 
stance to attach to the raised exception’s __ cause __ attribute. If the raised exception is 
not caught, Python prints both exceptions as part of the standard error message: 

>>> try: 


1/0 
. except Exception as E: 
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raise TypeError('Bad!') from E 


Traceback (most recent call last): 
File "<stdin>", line 2, in <module> 
ZeroDivisionError: int division or modulo by zero 


The above exception was the direct cause of the following exception: 


Traceback (most recent call last): 
File "<stdin>", line 4, in <module> 
TypeError: Bad! 


When an exception is raised inside an exception handler, a similar procedure is fol- 
lowed implicitly: the previous exception is attached to the new exception’s 
__context__ attribute and is again displayed in the standard error message if the ex- 
ception goes uncaught. This is an advanced and still somewhat obscure extension, so 
see Python’s manuals for more details. 


Vs, 
Version skew note: Python 3.0 no longer supports the raise Exc, Args 
form that is still available in Python 2.6. In 3.0, use the raise 
wa’ Exc(Args) instance-creation call form described in this book instead. 
` The equivalent comma form in 2.6 is legacy syntax provided for com- 
patibility with the now defunct string-based exceptions model, and it’s 
deprecated in 3.0. If used, it is converted to the 3.0 call form. As in earlier 
releases, a raise Exc form is also allowed—it is converted to raise 
Exc() in both versions, calling the class constructor with no arguments. 


The assert Statement 


As a somewhat special case for debugging purposes, Python includes the assert state- 
ment. It is mostly just syntactic shorthand for a common raise usage pattern, and an 
assert can be thought of as a conditional raise statement. A statement of the form: 


assert <test>, <data> # The <data> part is optional 


works like the following code: 


if _debug_: 
if not <test>: 
raise AssertionError(<data>) 


In other words, if the test evaluates to false, Python raises an exception: the data item 
(if it’s provided) is used as the exception’s constructor argument. Like all exceptions, 
the AssertionError exception will kill your program if it’s not caught with a try, in 
which case the data item shows up as part of the error message. 


As an added feature, assert statements may be removed from a compiled program’s 
byte code if the -0 Python command-line flag is used, thereby optimizing the program. 
AssertionError is a built-in exception, and the __ debug __ flag is a built-in name that is 
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automatically set to True unless the -0 flag is used. Use a command line like python -0 
main.py to run in optimized mode and disable asserts. 


Example: Trapping Constraints (but Not Errors!) 


Assertions are typically used to verify program conditions during development. When 
displayed, their error message text automatically includes source code line information 
and the value listed in the assert statement. Consider the file asserter.py: 

def f(x): 


assert x < 0, 'x must be negative’ 
return x ** 2 


% python 

>>> import asserter 

>>> asserter.f(1) 

Traceback (most recent call last): 
File "<stdin>", line 1, in <module> 
File "asserter.py", line 2, in f 

assert x < 0, 'x must be negative’ 

AssertionError: x must be negative 


It’s important to keep in mind that assert is mostly intended for trapping user-defined 
constraints, not for catching genuine programming errors. Because Python traps pro- 
gramming errors itself, there is usually no need to code asserts to catch things like out- 
of-bounds indexes, type mismatches, and zero divides: 

def reciprocal(x): 


assert x != 0 # A useless assert! 
return 1 / x # Python checks for zero automatically 


Such asserts are generally superfluous—because Python raises exceptions on errors 
automatically, you might as well let it do the job for you.* For another example of 
common assert usage, see the abstract superclass example in Chapter 28; there, we 
used assert to make calls to undefined methods fail with a message. 


with/as Context Managers 


Python 2.6 and 3.0 introduced a new exception-related statement—the with, and its 
optional as clause. This statement is designed to work with context manager objects, 
which support a new method-based protocol. This feature is also available as an option 
in 2.5, enabled with an import of this form: 


from future import with_statement 


+In most cases, at least. As suggested earlier in the book, if a function has to perform long-running or 
unrecoverable actions before it reaches the place where an exception will be triggered, you still might want 
to test for errors. Even in this case, though, be careful not to make your tests overly specific or restrictive, or 
you will limit your code’s utility. 
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In short, the with/as statement is designed to be an alternative to a common try/ 
finally usage idiom; like that statement, it is intended for specifying termination-time 
or “cleanup” activities that must run regardless of whether an exception occurs in a 
processing step. Unlike try/finally, though, the with statement supports a richer 
object-based protocol for specifying both entry and exit actions around a block of code. 


Python enhances some built-in tools with context managers, such as files that auto- 
matically close themselves and thread locks that automatically lock and unlock, but 
programmers can code context managers of their own with classes, too. 


Basic Usage 


The basic format of the with statement looks like this: 


with expression [as variable]: 
with-block 


The expression here is assumed to return an object that supports the context manage- 
ment protocol (more on this protocol ina moment). This object may also return a value 
that will be assigned to the name variable if the optional as clause is present. 


Note that the variable is not necessarily assigned the result of the expression; the result 
of the expression is the object that supports the context protocol, and the variable may 
be assigned something else intended to be used inside the statement. The object re- 
turned by the expression may then run startup code before the with-block is started, 
as well as termination code after the block is done, regardless of whether the block 
raised an exception or not. 


Some built-in Python objects have been augmented to support the context management 
protocol, and so can be used with the with statement. For example, file objects (covered 
in Chapter 9) have a context manager that automatically closes the file after the with 
block regardless of whether an exception is raised: 
with open(r'C:\misc\data') as myfile: 
for line in myfile: 

print(line) 

...more code here... 
Here, the call to open returns a simple file object that is assigned to the name myfile. 
We can use myfile with the usual file tools—in this case, the file iterator reads line by 
line in the for loop. 


However, this object also supports the context management protocol used by the 
with statement. After this with statement has run, the context management machinery 
guarantees that the file object referenced by myfile is automatically closed, even if the 
for loop raised an exception while processing the file. 


Although file objects are automatically closed on garbage collection, it’s not always 
straightforward to know when that will occur. The with statement in this role is an 
alternative that allows us to be sure that the close will occur after execution of a specific 
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block of code. As we saw earlier, we can achieve a similar effect with the more general 
and explicit try/finally statement, but it requires four lines of administrative code 
instead of one in this case: 
myfile = open(r'C:\misc\data' ) 
try: 
for line in myfile: 
print (line) 
...more code here... 
finally: 
myfile.close() 


We won't cover Python’s multithreading modules in this book (for more on that topic, 
see follow-up application-level texts such as Programming Python), but the lock and 
condition synchronization objects they define may also be used with the with statement, 
because they support the context management protocol: 

lock = threading.Lock() 

with lock: 


# critical section of code 
...access shared resources... 


Here, the context management machinery guarantees that the lock is automatically 
acquired before the block is executed and released once the block is complete, regard- 
less of exception outcomes. 


As introduced in Chapter 5, the decimal module also uses context managers to simplify 
saving and restoring the current decimal context, which specifies the precision and 
rounding characteristics for calculations: 

with decimal.localcontext() as ctx: 


ctx.prec = 2 
x = decimal.Decimal('1.00') / decimal.Decimal('3.00') 


After this statement runs, the current thread’s context manager state is automatically 
restored to what it was before the statement began. To do the same with a try/ 
finally, we would need to save the context before and restore it manually. 


The Context Management Protocol 


Although some built-in types come with context managers, we can also write new ones 
of our own. To implement context managers, classes use special methods that fall into 
the operator overloading category to tap into the with statement. The interface expected 
of objects used in with statements is somewhat complex, and most programmers only 
need to know how to use existing context managers. For tool builders who might want 
to write new application-specific context managers, though, let’s take a quick look at 
what’s involved. 


Here’s how the with statement actually works: 
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1. The expression is evaluated, resulting in an object known as a context manager that 
must have _enter__ and ___exit__ methods. 


2. The context manager’s__enter__ method is called. The value it returns is assigned 
to the variable in the as clause if present, or simply discarded otherwise. 


3. The code in the nested with block is executed. 


4. Ifthe with block raises an exception, the _exit__(type, value, traceback) method 
is called with the exception details. Note that these are the same values returned 
by sys.exc_info, described in the Python manuals and later in this part of the book. 
If this method returns a false value, the exception is reraised; otherwise, the ex- 
ception is terminated. The exception should normally be reraised so that it is 
propagated outside the with statement. 


5. If the with block does not raise an exception, the _ exit__ method is still called, 
but its type, value, and traceback arguments are all passed in as None. 


Let’s look at a quick demo of the protocol in action. The following defines a context 
manager object that traces the entry and exit of the with block in any with statement it 
is used for: 


class TraceBlock: 
def message(self, arg): 
print('running', arg) 
def _enter_ (self): 
print('starting with block') 
return self 
def _exit_ (self, exc_type, exc_value, exc_tb): 
if exc_type is None: 
print('exited normally\n') 
else: 
print('raise an exception!', exc_type) 
return False # Propagate 


with TraceBlock() as action: 
action.message('test 1') 
print('reached') 


with TraceBlock() as action: 
action.message('test 2') 
raise TypeError 
print('not reached') 


Notice that this class’s __exit__ method returns False to propagate the exception; 
deleting the return statement would have the same effect, as the default None return 
value of functions is False by definition. Also notice that the _enter__ method returns 
self as the object to assign to the as variable; in other use cases, this might return a 
completely different object instead. 


When run, the context manager traces the entry and exit of the with statement block 
with its enter and _exit__ methods. Here’s the script in action being run under 
Python 3.0 (it runs in 2.6, too, but prints some extra tuple parentheses): 
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% python withas.py 
starting with block 
running test 1 
reached 

exited normally 


starting with block 
running test 2 
raise an exception! <class 'TypeError'> 
Traceback (most recent call last): 
File "withas.py", line 20, in <module> 
raise TypeError 
TypeError 


Context managers are somewhat advanced devices for tool builders, so we'll skip ad- 
ditional details here (see Python’s standard manuals for the full story—for example, 
there’s a new contextlib standard module that provides additional tools for coding 
context managers). For simpler purposes, the try/finally statement provides sufficient 
support for termination-time activities. 


In the upcoming Python 3.1 release, the with statement may also specify 
multiple (sometimes referred to as “nested”) context managers with new 
comma syntax. In the following, for example, both files’ exit actions are 
automatically run when the statement block exits, regardless of excep- 
tion outcomes: 


with open('data') as fin, open('res', 'w') as fout: 
for line in fin: 
if 'some key’ in line: 
fout.write(line) 


Any number of context manager items may be listed, and multiple items 
work the same as nested with statements. In general, the 3.1 (and later) 
code: 


with A() as a, B() as b: 
...statements... 


is equivalent to the following, which works in 3.1, 3.0, and 2.6: 


with A() as a: 
with B() as b: 
...Statements... 


See Python 3.1 release notes for additional details. 


Chapter Summary 


In this chapter, we took a more detailed look at exception processing by exploring the 
statements related to exceptions in Python: try to catch them, raise to trigger them, 
assert to raise them conditionally, and with to wrap code blocks in context managers 
that specify entry and exit actions. 
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So far, exceptions probably seem like a fairly lightweight tool, and in fact, they are; the 
only substantially complex thing about them is how they are identified. The next chap- 
ter continues our exploration by describing how to implement exception objects of 
your own; as you'll see, classes allow you to code new exceptions specific to your 
programs. Before we move ahead, though, let’s work though the following short quiz 
on the basics covered here. 


Test Your Knowledge: Quiz 


1. 


What is the try statement for? 


2. What are the two common variations of the try statement? 


3. 


What is the raise statement for? 


4. What is the assert statement designed to do, and what other statement is it like? 


By 


What is the with/as statement designed to do, and what other statement is it like? 


Test Your Knowledge: Answers 


1. 


The try statement catches and recovers from exceptions—it specifies a block of 
code to run, and one or more handlers for exceptions that may be raised during 
the block’s execution. 


. The two common variations on the try statement are try/except/else (for catching 


exceptions) and try/finally (for specifying cleanup actions that must occur 
whether an exception is raised or not). In Python 2.4, these were separate state- 
ments that could be combined by syntactic nesting; in 2.5 and later, except and 
finally blocks may be mixed in the same statement, so the two statement forms 
are merged. In the merged form, the finally is still run on the way out of the try, 
regardless of what exceptions may have been raised or handled. 


. The raise statement raises (triggers) an exception. Python raises built-in excep- 


tions on errors internally, but your scripts can trigger built-in or user-defined ex- 
ceptions with raise, too. 


. The assert statement raises an AssertionError exception if a condition is false. It 


works like a conditional raise statement wrapped up in an if statement. 


. The with/as statement is designed to automate startup and termination activities 


that must occur around a block of code. It is roughly like a try/finally statement 
in that its exit actions run whether an exception occurred or not, but it allows a 
richer object-based protocol for specifying entry and exit actions. 
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CHAPTER 34 
Exception Objects 


So far, I’ve been deliberately vague about what an exception actually is. As suggested 
in the prior chapter, in Python 2.6 and 3.0 both built-in and user-defined exceptions 
are identified by class instance objects. Although this means you must use object- 
oriented programming to define new exceptions in your programs, classes and OOP 
in general offer a number of benefits. 


Here are some of the advantages of class-based exceptions: 


° They can be organized into categories. Exception classes support future changes 
by providing categories—adding new exceptions in the future won’t generally re- 
quire changes in try statements. 


e They have attached state information. Exception classes provide a natural place 
for us to store context information for use in the try handler—they may have both 
attached state information and callable methods, accessible through instances. 


° They support inheritance. Class-based exceptions can participate in inheritance 
hierarchies to obtain and customize common behavior—inherited display meth- 
ods, for example, can provide a common look and feel for error messages. 


Because of these advantages, class-based exceptions support program evolution and 
larger systems well. In fact, all built-in exceptions are identified by classes and are 
organized into an inheritance tree, for the reasons just listed. You can do the same with 
user-defined exceptions of your own. 


In Python 3.0, user-defined exceptions inherit from built-in exception superclasses. As 
we'll see here, because these superclasses provide useful defaults for printing and state 
retention, the task of coding user-defined exceptions also involves understanding the 
roles of these built-ins. 
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Version skew note: Python 2.6 and 3.0 both require exceptions to be 
defined by classes. In addition, 3.0 requires exception classes to be de- 
rived from the BaseException built-in exception superclass, either di- 
` rectly or indirectly. As we’ll see, most programs inherit from this class’s 
Exception subclass, to support catchall handlers for normal exception 
types—naming it in a handler will catch everything most programs 
should. Python 2.6 allows standalone classic classes to serve as excep- 
tions, too, but it requires new-style classes to be derived from built-in 
exception classes, the same as 3.0. 


Exceptions: Back to the Future 


Once upon a time (well, prior to Python 2.6 and 3.0), it was possible to define excep- 
tions in two different ways. This complicated try statements, raise statements, and 
Python in general. Today, there is only one way to do it. This is a good thing: it removes 
from the language substantial cruft accumulated for the sake of backward compatibil- 
ity. Because the old way helps explain why exceptions are as they are today, though, 
and because it’s not really possible to completely erase the history of something that 
has been used by a million people over the course of nearly two decades, let’s begin our 
exploration of the present with a brief look at the past. 


String Exceptions Are Right Out! 


Prior to Python 2.6 and 3.0, it was possible to define exceptions with both class in- 
stances and string objects. String-based exceptions began issuing deprecation warnings 
in 2.5 and were removed in 2.6 and 3.0, so today you should use class-based exceptions, 
as shown in this book. If you work with legacy code, though, you might still come 
across string exceptions. They might also appear in tutorials and web resources written 
a few years ago (which qualifies as an eternity in Python years!). 


String exceptions were straightforward to use—any string would do, and they matched 
by object identity, not value (that is, using is, not ==): 
C:\misc> C:\Python25\python 
>>> myexc = "My exception string" # Were we ever this young? 
>>> try: 
raise myexc 


. except myexc: 
print('caught' ) 


caught 
This form of exception was removed because it was not as good as classes for larger 


programs and code maintenance. Although you can’t use string exceptions today, they 
actually provide a natural vehicle for introducing the class-based exceptions model. 
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Class-Based Exceptions 


Strings were a simple way to define exceptions. As described earlier, however, classes 
have some added advantages that merit a quick look. Most prominently, they allow us 
to identify exception categories that are more flexible to use and maintain than simple 
strings. Moreover, classes naturally allow for attached exception details and support 
inheritance. Because they are the better approach, they are now required. 


Coding details aside, the chief difference between string and class exceptions has to do 
with the way that exceptions raised are matched against except clauses in try 
statements: 


e String exceptions were matched by simple object identity: the raised exception was 
matched to except clauses by Python’s is test. 


e Class exceptions are matched by superclass relationships: the raised exception 
matches an except clause if that except clause names the exception’s class or any 
superclass of it. 


That is, when a try statement’s except clause lists a superclass, it catches instances of 
that superclass, as well as instances of all its subclasses lower in the class tree. The net 
effect is that class exceptions support the construction of exception hierarchies: super- 
classes become category names, and subclasses become specific kinds of exceptions 
within a category. By naming a general exception superclass, an except clause can catch 
an entire category of exceptions—any more specific subclass will match. 


String exceptions had no such concept: because they were matched by simple object 
identity, there was no direct way to organize exceptions into more flexible categories 
or groups. The net result was that exception handlers were coupled with exception sets 
in a way that made changes difficult. 


In addition to this category idea, class-based exceptions better support exception state 
information (attached to instances) and allow exceptions to participate in inheritance 
hierarchies (to obtain common behaviors). Because they offer all the benefits of classes 
and OOP in general, they provide a more powerful alternative to the now defunct string- 
based exceptions model in exchange for a small amount of additional code. 


Coding Exceptions Classes 


Let’s look at an example to see how class exceptions translate to code. In the following 
file, classexc.py, we define a superclass called General and two subclasses called 
Specifici and Specific2. This example illustrates the notion of exception categories— 
General is a category name, and its two subclasses are specific types of exceptions within 
the category. Handlers that catch General will also catch any subclasses of it, including 
Specifici and Specific2: 


class General(Exception): pass 
class Specifici(General): pass 
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class Specific2(General): pass 


def raisero(): 
X = General() # Raise superclass instance 
raise X 


def raiser1(): 
X = Specific1() # Raise subclass instance 
raise X 


def raiser2(): 
X = Specific2() # Raise different subclass instance 
raise X 


for func in (raiserO, raiser1, raiser2): 
try: 
func() 
except General: # Match General or any subclass of it 
import sys 
print('caught:', sys.exc_info()[0]) 


C:\python30> python classexc.py 
caught: <class '_main_.General'> 
caught: <class '__main_.Specific1'> 
caught: <class '_main_.Specific2'> 


This code is mostly straightforward, but here are a few implementation notes: 


Exception superclass 

Classes used to build exception category trees have very few requirements—in fact, 
in this example they are mostly empty, with bodies that do nothing but pass. No- 
tice, though, how the top-level class here inherits from the built-in Exception class. 
This is required in Python 3.0; Python 2.6 allows standalone classic classes to serve 
as exceptions too, but it requires new-style classes to be derived from built-in ex- 
ception classes just like in 3.0. Although we don’t employ it here, because 
Exception provides some useful behavior we’ll meet later, it’s a good idea to inherit 
from it in either Python. 


Raising instances 
In this code, we call classes to make instances for the raise statements. In the class 
exception model, we always raise and catch a class instance object. If we list a class 
name without parentheses in a raise, Python calls the class with no constructor 
argument to make an instance for us. Exception instances can be created before 
the raise, as done here, or within the raise statement itself. 


Catching categories 
This code includes functions that raise instances of all three of our classes as ex- 
ceptions, as well as a top-level try that calls the functions and catches General 
exceptions. The same try also catches the two specific exceptions, because they 
are subclasses of General. 
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Exception details 

The exception handler here uses the sys.exc_info call—as we'll see in more detail 
in the next chapter, it’s how we can grab hold of the most recently raised exception 
in a generic fashion. Briefly, the first item in its result is the class of the exception 
raised, and the second is the actual instance raised. In a general except clause like 
the one here that catches all classes in a category, sys.exc_info is one way to de- 
termine exactly what’s occurred. In this particular case, it’s equivalent to fetching 
the instance’s __class_ attribute. As we'll see in the next chapter, the 
sys.exc_info scheme is also commonly used with empty except clauses that catch 
everything. 


The last point merits further explanation. When an exception is caught, we can be sure 
that the instance raised is an instance of the class listed in the except, or one of its more 
specific subclasses. Because of this, the __class__ attribute of the instance also gives 
the exception type. The following variant, for example, works the same as the prior 
example: 

class General(Exception): pass 


class Specifici(General): pass 
class Specific2(General): pass 


def raiserO(): raise General() 
def raiser1(): raise Specific1() 
def raiser2(): raise Specific2() 


for func in (raiserO, raiser1, raiser2): 


try: 
func() 

except General as X: # X is the raised instance 
print('caught:', X.__class__) # Same as sys.exc_info()[0] 


Because _class__ can be used like this to determine the specific type of exception 
raised, sys.exc_info is more useful for empty except clauses that do not otherwise have 
a way to access the instance or its class. Furthermore, more realistic programs usually 
should not have to care about which specific exception was raised at all—by calling 
methods of the instance generically, we automatically dispatch to behavior tailored for 
the exception raised. More on this and sys.exc_info in the next chapter; also see 
Chapter 28 and Part VI at large if you’ve forgotten what __class_ _ meansinan instance. 


Why Exception Hierarchies? 


Because there are only three possible exceptions in the prior section’s example, it 
doesn’t really do justice to the utility of class exceptions. In fact, we could achieve the 
same effects by coding a list of exception names in parentheses within the except clause: 
try: 
func() 
except (General, Specifici, Specific2): # Catch any of these 
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This approach worked for the defunct string exception model too. For large or high 
exception hierarchies, however, it may be easier to catch categories using class-based 
categories than to list every member of a category in a single except clause. Perhaps 
more importantly, you can extend exception hierarchies by adding new subclasses 
without breaking existing code. 


Suppose, for example, you code a numeric programming library in Python, to be used 
by a large number of people. While you are writing your library, you identify two things 
that can go wrong with numbers in your code—division by zero, and numeric overflow. 
You document these as the two exceptions that your library may raise: 


# mathlib.py 


class Divzero(Exception): pass 
class Oflow(Exception): pass 


def func(): 
raise Divzero() 


Now, when people use your library, they typically wrap calls to your functions or classes 
in try statements that catch your two exceptions (if they do not catch your exceptions, 
exceptions from the library will kill their code): 


# client.py 
import mathlib 


try: 
mathlib.func(...) 

except (mathlib.Divzero, mathlib.Oflow): 
..-handle and recover... 


This works fine, and lots of people start using your library. Six months down the road, 
though, you revise it (as programmers are prone to do). Along the way, you identify a 
new thing that can go wrong—underflow—and add that as a new exception: 


# mathlib.py 


class Divzero(Exception): pass 
class Oflow(Exception): pass 
class Uflow(Exception): pass 


Unfortunately, when you re-release your code, you create a maintenance problem for 
your users. If they’ve listed your exceptions explicitly, they now have to go back and 
change every place they call your library to include the newly added exception name: 


# client.py 


try: 
mathlib.func(...) 

except (mathlib.Divzero, mathlib.Oflow, mathlib.Uflow) : 
..-handle and recover... 
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This may not be the end of the world. If your library is used only in-house, you can 
make the changes yourself. You might also ship a Python script that tries to fix such 
code automatically (it would probably be only a few dozen lines, and it would guess 
right at least some of the time). If many people have to change all their try statements 
each time you alter your exception set, though, this is not exactly the most polite of 
upgrade policies. 


Your users might try to avoid this pitfall by coding empty except clauses to catch all 
possible exceptions: 


# client.py 


try: 
mathlib.func(...) 

except: # Catch everything here 
..-handle and recover... 


But this workaround might catch more than they bargained for—things like running 
out of memory, keyboard interrupts (Ctrl-C), system exits, and even typos in their own 
try block’s code will all trigger exceptions, and such things should pass, not be caught 
and erroneously classified as library errors. 


And really, in this scenario users want to catch and recover from only the specific ex- 
ceptions the library is defined and documented to raise; if any other exception occurs 
during a library call, it’s likely a genuine bug in the library (and probably time to contact 
the vendor!). As a rule of thumb, it’s usually better to be specific than general in ex- 
ception handlers—an idea we'll revisit as a “gotcha” in the next chapter.” 


So what to do, then? Class exception hierarchies fix this dilemma completely. Rather 
than defining your library’s exceptions as a set of autonomous classes, arrange them 
into a class tree with a common superclass to encompass the entire category: 


# mathlib.py 


class NumErr(Exception): pass 
class Divzero(NumErr): pass 
class Oflow(NumErr): pass 


def func(): 
raise DivZero() 


This way, users of your library simply need to list the common superclass (i.e., category) 
to catch all of your library’s exceptions, both now and in the future: 


* As a clever student of mine suggested, the library module could also provide a tuple object that contains all 
the exceptions the library can possibly raise—the client could then import the tuple and name it in an 
except clause to catch all the library’s exceptions (recall that including a tuple in an except means catch 
any of its exceptions). When new exceptions are added later, the library can just expand the exported tuple. 
This would work, but you’d still need to keep the tuple up-to-date with raised exceptions inside the library 
module. Also, class hierarchies offer more benefits than just categories—they also support inherited state 
and methods and a customization model that individual exceptions do not. 
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# client.py 

import mathlib 

try: 
mathlib.func(...) 


except mathlib.NumErr: 
...report and recover... 


When you go back and hack your code again, you can add new exceptions as new 
subclasses of the common superclass: 


# mathlib.py 


class Uflow(NumErr): pass 


The end result is that user code that catches your library’s exceptions will keep working, 
unchanged. In fact, you are free to add, delete, and change exceptions arbitrarily in the 
future—as long as clients name the superclass, they are insulated from changes in your 
exceptions set. In other words, class exceptions provide a better answer to maintenance 
issues than strings do. 


Class-based exception hierarchies also support state retention and inheritance in ways 
that make them ideal in larger programs. To understand these roles, though, we first 
need to see how user-defined exception classes relate to the built-in exceptions from 
which they inherit. 


Built-in Exception Classes 


I didn’t really pull the prior section’s examples out of thin air. All built-in exceptions 
that Python itself may raise are predefined class objects. Moreover, they are organized 
into a shallow hierarchy with general superclass categories and specific subclass types, 
much like the exceptions class tree we developed earlier. 


In Python 3.0, all the familiar exceptions you've seen (e.g., SyntaxError) are really just 
predefined classes, available as built-in names in the module named builtins (in Python 
2.6, they instead live in __builtin__ and are also attributes of the standard library 
module exceptions). In addition, Python organizes the built-in exceptions into a hier- 
archy, to support a variety of catching modes. For example: 


BaseException 
The top-level root superclass of exceptions. This class is not supposed to be directly 
inherited by user-defined classes (use Exception instead). It provides default print- 
ing and state retention behavior inherited by subclasses. If the str built-in is called 
on an instance of this class (e.g., by print), the class returns the display strings of 
the constructor arguments passed when the instance was created (or an empty 
string if there were no arguments). In addition, unless subclasses replace this class’s 
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constructor, all of the arguments passed to this class at instance construction time 
are stored in its args attribute as a tuple. 


Exception 

The top-level root superclass of application-related exceptions. This is an imme- 
diate subclass of BaseException and is superclass to every other built-in exception, 
except the system exit event classes (SystemExit, KeyboardInterrupt, and 
GeneratorExit). Almost all user-defined classes should inherit from this class, not 
BaseException. When this convention is followed, naming Exception in a try state- 
ment’s handler ensures that your program will catch everything but system exit 
events, which should normally be allowed to pass. In effect, Exception becomes a 
catchall in try statements and is more accurate than an empty except. 


ArithmeticError 
The superclass of all numeric errors (and a subclass of Exception). 


OverflowError 
A subclass of ArithmeticError that identifies a specific numeric error. 


And so on—you can read further about this structure in reference texts such as Python 
Pocket Reference or the Python library manual. Note that the exceptions class tree dif- 
fers slightly between Python 3.0 and 2.6. Also note that you can see the class tree in the 
help text of the exceptions module in Python 2.6 only (this module is removed in 3.0). 
See Chapters 4 and 15 for help on help: 

>>> import exceptions 


>>> help(exceptions) 
...lots of text omitted... 


Built-in Exception Categories 


The built-in class tree allows you to choose how specific or general your handlers will 
be. For example, the built-in exception ArithmeticError is a superclass for more specific 
exceptions such as OverflowError and ZeroDivisionError. By listing ArithmeticError 
in a try, you will catch any kind of numeric error raised; by listing just 
OverflowError, you will intercept just that specific type of error, and no others. 


Similarly, because Exception is the superclass of all application-level exceptions in Py- 
thon 3.0, you can generally use it as a catchall—the effect is much like an empty 
except, but it allows system exit exceptions to pass as they usually should: 


try: 

action() 
except Exception: 

..-handle all application exceptions... 
else: 

...-handle no-exception case... 
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This doesn’t quite work universally in Python 2.6, however, because standalone user- 
defined exceptions coded as classic classes are not required to be subclasses of the 
Exception root class. This technique is more reliable in Python 3.0, since it requires all 
classes to derive from built-in exceptions. Even in Python 3.0, though, this scheme 
suffers most of the same potential pitfalls as the empty except, as described in the prior 
chapter—it might intercept exceptions intended for elsewhere, and it might mask gen- 
uine programming errors. Since this is such a common issue, we'll revisit it as a “gotcha” 
in the next chapter. 


Whether or not you will leverage the categories in the built-in class tree, it serves as a 
good example; by using similar techniques for class exceptions in your own code, you 
can provide exception sets that are flexible and easily modified. 


Default Printing and State 


Built-in exceptions also provide default print displays and state retention, which is often 
as much logic as user-defined classes require. Unless you redefine the constructors your 
classes inherit from them, any constructor arguments you pass to these classes are saved 
in the instance’s args tuple attribute and are automatically displayed when the instance 
is printed (an empty tuple and display string are used if no constructor arguments are 
passed). 


This explains why arguments passed to built-in exception classes show up in error 
messages—any constructor arguments are attached to the instance and displayed when 
the instance is printed: 

>>> raise IndexError # Same as IndexError(): no arguments 

Traceback (most recent call last): 


File "<stdin>", line 1, in <module> 
IndexError 


>>> raise IndexError('spam') # Constructor argument attached, printed 
Traceback (most recent call last): 

File "<stdin>", line 1, in <module> 
IndexError: spam 


>>> I = IndexError('spam' ) # Available in object attribute 
>>> I.args 


(‘spam', ) 
The same holds true for user-defined exceptions, because they inherit the constructor 
and display methods present in their built-in superclasses: 
>>> class E(Exception): pass 
>>> try: 
raise E('spam') 


. except E as X: 
print(X, X.args) # Displays and saves constructor arguments 


spam (‘spam', ) 


866 | Chapter 34: Exception Objects 


>>> try: 
raise E('spam', 'eggs', ‘ham') 
. except E as X: 
print(X, X.args) 


(‘spam' , ‘eggs', ‘ham') (‘spam', ‘eggs’, 'ham') 


Note that exception instance objects are not strings themselves, but use the _ str __ 
operator overloading protocol we studied in Chapter 29 to provide display strings when 
printed; to concatenate with real strings, perform manual conversions: str(X) + 
"string". 


Although this automatic state and display support is useful by itself, for more specific 
display and state retention needs you can always redefine inherited methods such as 
__str__and_init__ in Exception subclasses—the next section shows how. 


Custom Print Displays 


As we saw in the preceding section, by default, instances of class-based exceptions 
display whatever you passed to the class constructor when they are caught and printed: 


>>> class MyBad(Exception): pass 
>>> try: 
raise MyBad('Sorry--my mistake! ') 


. except MyBad as X: 
print(X) 


Sorry--my mistake! 


This inherited default display model is also used if the exception is displayed as part of 
an error message when the exception is not caught: 

>>> raise MyBad('Sorry--my mistake! ') 

Traceback (most recent call last): 


File "<stdin>", line 1, in <module> 
__main__.MyBad: Sorry--my mistake! 


For many roles, this is sufficient. To provide a more custom display, though, you can 
define one of two string-representation overloading methods in your class (__repr__ or 
__str__) to return the string you want to display for your exception. The string the 
method returns will be displayed if the exception either is caught and printed or reaches 
the default handler: 

>>> class MyBad(Exception): 


def _str_(self): 
return ‘Always look on the bright side of life...' 


>>> try: 
raise MyBad() 


. except MyBad as X: 
print(X) 
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Always look on the bright side of life... 


>>> raise MyBad() 
Traceback (most recent call last): 
File "<stdin>", line 1, in <module> 
__main__.MyBad: Always look on the bright side of life... 


A subtle point to note here is that you generally must redefine _str__ for this purpose, 
because the built-in superclasses already have a__str__ method, and _ str__is pre- 
ferredto__repr__ inmost contexts (including printing). Ifyou definea __repr_, print- 
ing will happily call the superclass’s __str__ instead! See Chapter 29 for more details 
on these special methods. 


Whatever your method returns is included in error messages for uncaught exceptions 
and used when exceptions are printed explicitly. The method returns a hardcoded 
string here to illustrate, but it can also perform arbitrary text processing, possibly using 
state information attached to the instance object. The next section looks at state in- 
formation options. 


Custom Data and Behavior 


Besides supporting flexible hierarchies, exception classes also provide storage for extra 
state information as instance attributes. As we saw earlier, built-in exception super- 
classes provide a default constructor that automatically saves constructor arguments 
in an instance tuple attribute named args. Although the default constructor is adequate 
for many cases, for more custom needs we can provide a constructor of our own. In 
addition, classes may define methods for use in handlers that provide precoded excep- 
tion processing logic. 


Providing Exception Details 


When an exception is raised, it may cross arbitrary file boundaries—the raise state- 
ment that triggers an exception and the try statement that catches it may be in com- 
pletely different module files. It is not generally feasible to store extra details in global 
variables because the try statement might not know which file the globals reside in. 
Passing extra state information along in the exception itself allows the try statement 
to access it more reliably. 


With classes, this is nearly automatic. As we’ve seen, when an exception is raised, 
Python passes the class instance object along with the exception. Code in try statements 
can access the raised instance by listing an extra variable after the as keyword in an 


except handler. This provides a natural hook for supplying data and behavior to the 
handler. 
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For example, a program that parses data files might signal a formatting error by raising 
an exception instance that is filled out with extra details about the error: 
>>> class FormatError (Exception): 
def _ init__(self, line, file): 
self.line = line 
self.file = file 


>>> def parser(): 
raise FormatError(42, file='spam.txt') # When error found 


>>> try: 
parser() 


. except FormatError as X: 
print('Error at', X.file, X.line) 


Error at spam.txt 42 


In the except clause here, the variable X is assigned a reference to the instance that was 
generated when the exception was raised. This gives access to the attributes attached 
to the instance by the custom constructor. Although we could rely on the default state 
retention of built-in superclasses, it’s less relevant to our application: 


>>> class FormatError(Exception): pass # Inherited constructor 


>>> def parser(): 

raise FormatError(42, ‘spam.txt') # No keywords allowed! 
>>> try: 

parser() 


. except FormatError as X: 
print('Error at:', X.args[0], X.args[1]) | # Not specific to this app 


Error at: 42 spam.txt 


Providing Exception Methods 


Besides enabling application-specific state information, custom constructors also better 
support extra behavior for exception objects. That is, the exception class can also define 
methods to be called in the handler. The following, for example, adds a method that 
uses exception state information to log errors to a file: 


class FormatError(Exception): 
logfile = 'formaterror.txt' 
def init__(self, line, file): 
self.line = line 
self.file = file 


t+ As suggested earlier, the raised instance object is also available generically as the second item in the result 
tuple of the sys.exc_info() call—a tool that returns information about the most recently raised exception. 
This interface must be used if you do not list an exception name in an except clause but still need access to 
the exception that occurred, or to any of its attached state information or methods. More on sys.exc_info 
in the next chapter. 
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def logerror(self): 
log = open(self.logfile, 'a') 
print('Error at', self.file, self.line, file=log) 


def parser(): 
raise FormatError(40, 'spam.txt') 


try: 
parser () 

except FormatError as exc: 
exc. logerror() 


When run, this script writes its error message to a file in response to method calls in 
the exception handler: 
C:\misc> C:\Python30\python parse.py 


C:\misc> type formaterror.txt 
Error at spam.txt 40 


In such a class, methods (like logerror) may also be inherited from superclasses, and 
instance attributes (like line and file) provide a place to save state information that 
provides extra context for use in later method calls. Moreover, exception classes are 
free to customize and extend inherited behavior. In other words, because they are de- 
fined with classes, all the benefits of OOP that we studied in Part VI are available for 
use with exceptions in Python. 


Chapter Summary 


In this chapter, we explored coding user-defined exceptions. As we learned, exceptions 
are implemented as class instance objects in Python 2.6 and 3.0 (an earlier string-based 
exception model alternative was available in earlier releases but has now been depre- 
cated). Exception classes support the concept of exception hierarchies that ease main- 
tenance, allow data and behavior to be attached to exceptions as instance attributes 
and methods, and allow exceptions to inherit data and behavior from superclasses. 


We saw that in a try statement, catching a superclass catches that class as well as all 
subclasses below it in the class tree—superclasses become exception category names, 
and subclasses become more specific exception types within those categories. We also 
saw that the built-in exception superclasses we must inherit from provide usable de- 
faults for printing and state retention, which we can override if desired. 


The next chapter wraps up this part of the book by exploring some common use cases 
for exceptions and surveying tools commonly used by Python programmers. Before we 
get there, though, here’s this chapter’s quiz. 
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Test Your Knowledge: Quiz 


1. 


What are the two new constraints on user-defined exceptions in Python 3.0? 


2. How are raised class-based exceptions matched to handlers? 


3. 


Name two ways that you can attach context information to exception objects. 


4, Name two ways that you can specify the error message text for exception objects. 


J; 


Why should you not use string-based exceptions anymore today? 


Test Your Knowledge: Answers 


1 


. In 3.0, exceptions must be defined by classes (that is, a class instance object is raised 


and caught). In addition, exception classes must be derived from the built-in class 
BaseException (most programs inherit from its Exception subclass, to support 
catchall handlers for normal kinds of exceptions). 


. Class-based exceptions match by superclass relationships: naming a superclass in 


an exception handler will catch instances of that class, as well as instances of any 
ofits subclasses lower in the class tree. Because of this, you can think of superclasses 
as general exception categories and subclasses as more specific types of exceptions 
within those categories. 


. Youcanattach context information to class-based exceptions by filling out instance 


attributes in the instance object raised, usually in a custom class constructor. For 
simpler needs, built-in exception superclasses provide a constructor that stores its 
arguments on the instance automatically (in the attribute args). In exception han- 
dlers, you list a variable to be assigned to the raised instance, then go through this 
name to access attached state information and call any methods defined in the class. 


. The error message text in class-based exceptions can be specified with a custom 


__str__ operator overloading method. For simpler needs, built-in exception su- 
perclasses automatically display anything you pass to the class constructor. Oper- 
ations like print and str automatically fetch the display string of an exception 
object when is it printed either explicitly or as part of an error message. 


. Because Guido said so—they have been removed in both Python 2.6 and 3.0. 


Really, there are good reasons for this: string-based exceptions did not support 
categories, state information, or behavior inheritance in the way class-based ex- 
ceptions do. In practice, this made string-based exceptions easier to use at first, 
when programs were small, but more complex to use as programs grew larger. 
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CHAPTER 35 
Designing with Exceptions 


This chapter rounds out this part of the book with a collection of exception design 
topics and common use case examples, followed by this part’s gotchas and exercises. 
Because this chapter also closes out the fundamentals portion of the book at large, it 
includes a brief overview of development tools as well to help you as you make the 
migration from Python beginner to Python application developer. 


Nesting Exception Handlers 


Our examples so far have used only a single try to catch exceptions, but what happens 
if one try is physically nested inside another? For that matter, what does it mean if a 
try calls a function that runs another try? Technically, try statements can nest, in terms 
of syntax and the runtime control flow through your code. 


Both of these cases can be understood if you realize that Python stacks try statements 
at runtime. When an exception is raised, Python returns to the most recently entered 
try statement with a matching except clause. Because each try statement leaves a 
marker, Python can jump back to earlier trys by inspecting the stacked markers. This 
nesting of active handlers is what we mean when we talk about propagating exceptions 
up to “higher” handlers—such handlers are simply try statements entered earlier in 
the program’s execution flow. 


Figure 35-1 illustrates what occurs when try statements with except clauses nest at 
runtime. The amount of code that goes into a try block can be substantial, and it may 
contain function calls that invoke other code watching for the same exceptions. When 
an exception is eventually raised, Python jumps back to the most recently entered 
try statement that names that exception, runs that statement’s except clause, and then 
resumes execution after that try. 


Once the exception is caught, its life is over—control does not jump back to all match- 
ing trys that name the exception; only the first one is given the opportunity to handle 
it. In Figure 35-1, for instance, the raise statement in the function func2 sends control 
back to the handler in func1, and then the program continues within func. 
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æ def funci(): : def func2(): 
try: i re 
except E: func2()efe raise E 


except E: 


E 


Figure 35-1. Nested try/except statements: when an exception is raised (by you or by Python), control 
jumps back to the most recently entered try statement with a matching except clause, and the program 
resumes after that try statement. except clauses intercept and stop the exception—they are where you 
process and recover from exceptions. 


By contrast, when try statements that contain only finally clauses are nested, each 
finally block is run in turn when an exception occurs—Python continues propagating 
the exception up to other trys, and eventually perhaps to the top-level default handler 
(the standard error message printer). As Figure 35-2 illustrates, the finally clauses do 
not kill the exception—they just specify code to be run on the way out of each try 
during the exception propagation process. If there are many try/finally clauses active 
when an exception occurs, they will all be run, unless a try/except catches the exception 
somewhere along the way. 


try: : : peee 
func1() ; ; 
finally: 


func2() . ni 
finally: 


Figure 35-2. Nested try/finally statements: when an exception is raised here, control returns to the 
most recently entered try to run its finally statement, but then the exception keeps propagating to all 
finallys in all active try statements and eventually reaches the default top-level handler, where an 
error message is printed. finally clauses intercept (but do not stop) an exception—they are for actions 
to be performed “on the way out.” 


In other words, where the program goes when an exception is raised depends entirely 
upon where it has been—it’s a function of the runtime flow of control through the script, 
not just its syntax. The propagation of an exception essentially proceeds backward 
through time to try statements that have been entered but not yet exited. This propa- 
gation stops as soon as control is unwound to a matching except clause, but not as it 
passes through finally clauses on the way. 
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Example: Control-Flow Nesting 


Let’s turn to an example to make this nesting concept more concrete. The following 
module file, nestexc.py, defines two functions. action2 is coded to trigger an exception 
(you can’t add numbers and sequences), and action1 wraps a call to action2 in a try 
handler, to catch the exception: 


def action2(): 


print(1 + []) # Generate TypeError 
def action1(): 
try: 
action2() 
except TypeError: # Most recent matching try 


print('inner try') 


try: 
action1() 

except TypeError: # Here, only if action! re-raises 
print('outer try') 


% python nestexc.py 
inner try 


Notice, though, that the top-level module code at the bottom of the file wraps a call to 
action1 in a try handler, too. When actionz2 triggers the TypeError exception, there will 
be two active try statements—the one in action1, and the one at the top level of the 
module file. Python picks and runs just the most recent try with a matching except, 
which in this case is the try inside action1. 


As I’ve mentioned, the place where an exception winds up jumping to depends on the 
control flow through the program at runtime. Because of this, to know where you will 
go, you need to know where you’ve been. In this case, where exceptions are handled 
is more a function of control flow than of statement syntax. However, we can also nest 
exception handlers syntactically—an equivalent case we’ll look at next. 


Example: Syntactic Nesting 


As I mentioned when we looked at the new unified try/except/finally statement in 
Chapter 33, it is possible to nest try statements syntactically by their position in your 
source code: 


try: 
try: 
action2() 
except TypeError: # Most recent matching try 
print('inner try') 
except TypeError: # Here, only if nested handler re-raises 


print('outer try') 
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Really, this code just sets up the same handler-nesting structure as (and behaves iden- 
tically to) the prior example. In fact, syntactic nesting works just like the cases sketched 
in Figures 35-1 and 35-2; the only difference is that the nested handlers are physically 
embedded in a try block, not coded in functions called elsewhere. For example, nested 
finally handlers all fire on an exception, whether they are nested syntactically or by 
means of the runtime flow through physically separated parts of your code: 


>>> try: 
try: 
raise IndexError 
finally: 
ee print('spam' ) 
... Finally: 
print('SPAM' ) 
spam 
SPAM 
Traceback (most recent call last): 
File "<stdin>", line 3, in <module> 
IndexError 


See Figure 35-2 for a graphic illustration of this code’s operation; the effect is the same, 
but the function logic has been inlined as nested statements here. For a more useful 
example of syntactic nesting at work, consider the following file, except-finally.py: 


def raise1(): raise IndexError 
def noraise(): return 
def raise2(): raise SyntaxError 


for func in (raise1, noraise, raise2): 
print('\n', func, sep='') 
try: 
try: 
func() 
except IndexError: 
print('caught IndexError' ) 
finally: 
print('finally run’) 


This code catches an exception if one is raised and performs a finally termination- 
time action regardless of whether an exception occurs. This may take a few moments 
to digest, but the effect is much like combining an except and a finally clause in a 
single try statement in Python 2.5 and later: 

% python except-finally.py 

<function raise1 at 0x026ECA98> 


caught IndexError 
finally run 


<function noraise at 0x026ECA50> 
finally run 


<function raise2 at 0x026ECBB8> 
finally run 
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Traceback (most recent call last): 
File "except-finally.py", line 9, in <module> 
func() 
File "except-finally.py", line 3, in raise2 
def raise2(): raise SyntaxError 
SyntaxError: None 


As we saw in Chapter 33, as of Python 2.5, except and finally clauses can be mixed 
in the same try statement. This makes some of the syntactic nesting described in this 
section unnecessary, though it still works, may appear in code written prior to Python 
2.5 that you may encounter, and can be used as a technique for implementing alter- 
native exception-handling behaviors. 


Exception Idioms 


We’ve seen the mechanics behind exceptions. Now let’s take a look at some of the other 
ways they are typically used. 


Exceptions Aren't Always Errors 


In Python, all errors are exceptions, but not all exceptions are errors. For instance, we 
saw in Chapter 9 that file object read methods return an empty string at the end of a 
file. In contrast, the built-in input function (which we first met in Chapter 3 and de- 
ployed in an interactive loop in Chapter 10) reads a line of text from the standard input 
stream, sys.stdin, at each call and raises the built-in EOFError at end-of-file. (This 
function is known as raw_input in Python 2.6.) 


Unlike file methods, this function does not return an empty string—an empty string 
from input means an empty line. Despite its name, the EOFError exception is just a 
signal in this context, not an error. Because of this behavior, unless the end-of-file 
should terminate a script, input often appears wrapped in a try handler and nested in 
a loop, as in the following code: 


while True: 
try: 
line = input() # Read line from stdin 
except EOFError: 
break # Exit loop at end-of-file 
else: 


...process next line here... 


Several other built-in exceptions are similarly signals, not errors—calling sys.exit() 
and pressing Ctrl-C on your keyboard, respectively, raise SystemExit and Key 
boardInterrupt, for example. Python also has a set of built-in exceptions that represent 
warnings rather than errors; some of these are used to signal use of deprecated (phased 
out) language features. See the standard library manual’s description of built-in excep- 
tions for more information, and consult the warnings module’s documentation for more 
on warnings. 
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Functions Can Signal Conditions with raise 


User-defined exceptions can also signal nonerror conditions. For instance, a search 
routine can be coded to raise an exception when a match is found instead of returning 
a status flag for the caller to interpret. In the following, the try/except/else exception 
handler does the work of an if/else return-value tester: 


class Found(Exception): pass 
def searcher(): 


if ...success...: 
raise Found() 


else: 
return 
try: 
searcher() 
except Found: # Exception if item was found 
+ SUCCESS... 
else: # else returned: not found 
...failure... 


More generally, such a coding structure may also be useful for any function that cannot 
return a sentinel value to designate success or failure. For instance, if all objects are 
potentially valid return values, it’s impossible for any return value to signal unusual 
conditions. Exceptions provide a way to signal results without a return value: 


class Failure(Exception): pass 


def searcher(): 
if ...success...: 
return ...founditem... 
else: 
raise Failure() 


try: 
item = searcher() 
except Failure: 
..-report... 
else: 
...use item here... 


Because Python is dynamically typed and polymorphic to the core, exceptions, rather 
than sentinel return values, are the generally preferred way to signal such conditions. 


Closing Files and Server Connections 


We encountered examples in this category in Chapter 33. As a summary, though, ex- 
ception processing tools are also commonly used to ensure that system resources are 
finalized, regardless of whether an error occurs during processing or not. 
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For example, some servers require connections to be closed in order to terminate a 
session. Similarly, output files may require close calls to flush their buffers to disk, and 
input files may consume file descriptors if not closed; although file objects are auto- 
matically closed when garbage collected if still open, it’s sometimes difficult to be sure 
when that will occur. 


The most general and explicit way to guarantee termination actions for a specific block 
of code is the try/finally statement: 
myfile = open(r'C:\misc\script', 'w') 
try: 
...process myfile... 
finally: 
myfile.close() 
As we saw in Chapter 33, some objects make this easier in Python 2.6 and 3.0 by 
providing context managers run by the with/as statement that terminate or close the 
objects for us automatically: 
with open(r'C:\misc\script', 'w') as myfile: 
...process myfile... 
So which option is better here? As usual, it depends on your programs. Compared to 
the try/finally, context managers are more implicit, which runs contrary to Python’s 
general design philosophy. Context managers are also arguably less general—they are 
available only for select objects, and writing user-defined context managers to handle 
general termination requirements is more complex than coding a try/finally. 


On the other hand, using existing context managers requires less code than using try/ 
finally, as shown by the preceding examples. Moreover, the context manager protocol 
supports entry actions in addition to exit actions. Although the try/finally is perhaps 
the more widely applicable technique, context managers may be more appropriate 
where they are already available, or where their extra complexity is warranted. 


Debugging with Outer try Statements 


You can also make use of exception handlers to replace Python’s default top-level 
exception-handling behavior. By wrapping an entire program (or a call to it) in an outer 
try in your top-level code, you can catch any exception that may occur while your 
program runs, thereby subverting the default program termination. 


In the following, the empty except clause catches any uncaught exception raised while 
the program runs. To get hold of the actual exception that occurred, fetch the 
sys.exc_info function call result from the built-in sys module; it returns a tuple whose 
first two items contain the current exception’s class and the instance object raised (more 
on sys.exc_info in a moment): 
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try: 
...run program... 

except: # All uncaught exceptions come here 
import sys 
print(‘uncaught!', sys.exc_info()[0], sys.exc_info()[1]) 


This structure is commonly used during development, to keep programs active even 
after errors occur—it allows you to run additional tests without having to restart. It’s 
also used when testing other program code, as described in the next section. 


Running In-Process Tests 


You might combine some of the coding patterns we’ve just looked at in a test-driver 
application that tests other code within the same process: 
import sys 
log = open('testlog', 'a') 
from testapi import moreTests, runNextTest, testName 
def testdriver(): 
while moreTests(): 
try: 
runNextTest() 
except: 
print('FAILED', testName(), sys.exc_info()[:2], file=log) 
else: 
print('PASSED', testName(), file=log) 
testdriver() 


The testdriver function here cycles through a series of test calls (the module testapi 
is left abstract in this example). Because an uncaught exception in a test case would 
normally kill this test driver, you need to wrap test case calls in a try if you want to 
continue the testing process after a test fails. The empty except catches any uncaught 
exception generated by a test case as usual, and it uses sys.exc_info to log the exception 
to a file. The else clause is run when no exception occurs—the test success case. 


Such boilerplate code is typical of systems that test functions, modules, and classes by 
running them in the same process as the test driver. In practice, however, testing can 
be much more sophisticated than this. For instance, to test external programs, you 
could instead check status codes or outputs generated by program-launching tools such 
as os.system and os.popen, covered in the standard library manual (such tools do not 
generally raise exceptions for errors in the external programs—in fact, the test cases 
may run in parallel with the test driver). 


At the end of this chapter, we’ll also meet some more complete testing frameworks 
provided by Python, such as doctest and PyUnit, which provide tools for comparing 
expected outputs with actual results. 
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More on sys.exc_info 


The sys.exc_info result used in the last two sections allows an exception handler to 
gain access to the most recently raised exception generically. This is especially useful 
when using the empty except clause to catch everything blindly, to determine what was 
raised: 


try: 
except: 
# sys.exc_info()[0:2] are the exception class and instance 


If no exception is being handled, this call it returns a tuple containing three None values. 
Otherwise, the values returned are (type, value, traceback), where: 


e type is the exception class of the exception being handled. 
e value is the exception class instance that was raised. 


e traceback is a traceback object that represents the call stack at the point where the 
exception originally occurred (see the traceback module’s documentation for tools 
that may be used in conjunction with this object to generate error messages 
manually). 


As we saw in the prior chapter, sys.exc_info can also sometimes be useful to determine 
the specific exception type when catching exception category superclasses. As we saw, 
though, because in this case you can also get the exception type by fetching the 
__class__ attribute of the instance obtained with the as clause, sys.exc_info is mostly 
used by the empty except today: 


try: 


except General as instance: 
# instance.__class__ is the exception class 


That said, using the instance object’s interfaces and polymorphism is often a better 
approach than testing exception types—exception methods can be defined per class 
and run generically: 
try: 
except General as instance: 
# instance.method() does the right thing for this instance 


As usual, being too specific in Python can limit your code’s flexibility. A polymorphic 
approach like the last example here generally supports future evolution better. 
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Version skew note: In Python 2.6, the older tools sys.exc_type and 
sys.exc_value still work to fetch the most recent exception type and 
value, but they can manage only a single, global exception for the entire 
` process. These two names have been removed in Python 3.0. The newer 
and preferred sys.exc_info() call available in both 2.6 and 3.0 instead 
keeps track of each thread’s exception information, and so is thread- 
specific. Of course, this distinction matters only when using multiple 
threads in Python programs (a subject beyond this book’s scope), but 
3.0 forces the issue. See other resources for more details. 


Exception Design Tips and Gotchas 


Pm lumping design tips and gotchas together in this chapter, because it turns out that 
the most common gotchas largely stem from design issues. By and large, exceptions 
are easy to use in Python. The real art behind them is in deciding how specific or general 
your except clauses should be and how much code to wrap up in try statements. Let’s 
address the second of these concerns first. 


What Should Be Wrapped 


In principle, you could wrap every statement in your script in its own try, but that 
would just be silly (the try statements would then need to be wrapped in try state- 
ments!). What to wrap is really a design issue that goes beyond the language itself, and 
it will become more apparent with use. But for now, here are a few rules of thumb: 


e Operations that commonly fail should generally be wrapped in try statements. For 
example, operations that interface with system state (file opens, socket calls, and 
the like) are prime candidates for trys. 


e However, there are exceptions to the prior rule—in a simple script, you may 
want failures of such operations to kill your program instead of being caught and 
ignored. This is especially true if the failure is a showstopper. Failures in Python 
typically result in useful error messages (not hard crashes), and this is often the 
best outcome you could hope for. 


e You should implement termination actions in try/finally statements to guarantee 
their execution, unless a context manager is available as a with/as option. The try/ 
finally statement form allows you to run code whether exceptions occur or not 
in arbitrary scenarios. 


e It is sometimes more convenient to wrap the call to a large function in a single 
try statement, rather than littering the function itself with many try statements. 
That way, all exceptions in the function percolate up to the try around the call, 
and you reduce the amount of code within the function. 


The types of programs you write will probably influence the amount of exception han- 
dling you code as well. Servers, for instance, must generally keep running persistently 
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and so will likely require try statements to catch and recover from exceptions. In- 
process testing programs of the kind we saw in this chapter will probably handle ex- 
ceptions as well. Simpler one-shot scripts, though, will often ignore exception handling 
completely because failure at any step requires script shutdown. 


Catching Too Much: Avoid Empty except and Exception 


On to the issue of handler generality. Python lets you pick and choose which exceptions 
to catch, but you sometimes have to be careful to not be too inclusive. For example, 
you’ve seen that an empty except clause catches every exception that might be raised 
while the code in the try block runs. 


That’s easy to code, and sometimes desirable, but you may also wind up intercepting 
an error that’s expected by a try handler higher up in the exception nesting structure. 
For example, an exception handler such as the following catches and stops every ex- 
ception that reaches it, regardless of whether another handler is waiting for it: 


def func(): 

try: 

says # IndexError is raised in here 
except: 
# But everything comes here and dies! 

try: 

func() 
except IndexError: # Exception should be processed here 


Perhaps worse, such code might also catch unrelated system exceptions. Even things 
like memory errors, genuine programming mistakes, iteration stops, keyboard inter- 
rupts, and system exits raise exceptions in Python. Such exceptions should not usually 
be intercepted. 


For example, scripts normally exit when control falls off the end of the top-level file. 
However, Python also provides a built-in sys.exit(statuscode) call to allow early ter- 
minations. This actually works by raising a built-in SystemExit exception to end the 
program, so that try/finally handlers run on the way out and special types of programs 
can intercept the event.” Because of this, a try with an empty except might unknowingly 
prevent a crucial exit, as in the following file (exiter.py): 
import sys 
def bye(): 
sys.exit(40) # Crucial error: abort now! 
try: 
bye() 
except: 
print(‘got it') # Oops--we ignored the exit 


* A related call, os._exit, also ends a program, but via an immediate termination—it skips cleanup actions 
and cannot be intercepted with try/except or try/finally blocks. It is usually only used in spawned child 
processes, a topic beyond this book’s scope. See the library manual or follow-up texts for details. 
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print(‘continuing...') 


% python exiter.py 

got it 

continuing... 
You simply might not expect all the kinds of exceptions that could occur during an 
operation. Using the built-in exception classes of the prior chapter can help in this 
particular case, because the Exception superclass is not a superclass of SystemExit: 

try: 


bye() 
except Exception: # Won't catch exits, but _will_ catch many others 


In other cases, though, this scheme is no better than an empty except clause—because 
Exception is a superclass above all built-in exceptions except system-exit events, it still 
has the potential to catch exceptions meant for elsewhere in the program. 


Probably worst of all, both an empty except and catching the Exception class will also 
catch genuine programming errors, which should be allowed to pass most of the time. 
In fact, these two techniques can effectively turn off Python’s error-reporting ma- 
chinery, making it difficult to notice mistakes in your code. Consider this code, for 
example: 


mydictionary = {...} 


try: 


x = myditctionary[ ‘spam’ ] # Oops: misspelled 
except: 
x = None # Assume we got KeyError 


...continue here with x... 


The coder here assumes that the only sort of error that can happen when indexing a 
dictionary is a missing key error. But because the name myditctionary is misspelled (it 
should say mydictionary), Python raises a NameError instead for the undefined name 
reference, which the handler will silently catch and ignore. The event handler will in- 
correctly fillin a default for the dictionary access, masking the program error. Moreover, 
catching Exception here would have the exact same effect as an empty except. If this 
happens in code that is far removed from the place where the fetched values are used, 
it might make for a very interesting debugging task! 


Asa rule of thumb, be as specific in your handlers as you can be—empty except clauses 
and Exception catchers are handy, but potentially error-prone. In the last example, for 
instance, you would be better off saying except KeyError: to make your intentions 
explicit and avoid intercepting unrelated events. In simpler scripts, the potential for 
problems might not be significant enough to outweigh the convenience of a catchall, 
but in general, general handlers are generally trouble. 
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Catching Too Little: Use Class-Based Categories 


On the other hand, neither should handlers be too specific. When you list specific 
exceptions in a try, you catch only what you actually list. This isn’t necessarily a bad 
thing, but if a system evolves to raise other exceptions in the future, you may need to 
go back and add them to exception lists elsewhere in your code. 


We saw this phenomenon at work in the prior chapter. For instance, the following 
handler is written to treat MyExcept1 and MyExcept2 as normal cases and everything else 
as an error. Therefore, if you add a MyExcept3 in the future, it will be processed as an 
error unless you update the exception list: 


try: 


except (MyExcept1, MyExcept2): # Breaks if you add a MyExcept3 
Sis # Non-errors 
else: 
# Assumed to be an error 


Luckily, careful use of the class-based exceptions we discussed in Chapter 33 can make 
this trap go away completely. As we saw, if you catch a general superclass, you can add 
and raise more specific subclasses in the future without having to extend except clause 
lists manually—the superclass becomes an extendible exceptions category: 


try: 

except SuccessCategoryName: # OK if I add a myerror3 subclass 
vis # Non-errors 

else: 


# Assumed to be an error 


In other words, a little design goes a long way. The moral of the story is to be careful 
to be neither too general nor too specific in exception handlers, and to pick the gran- 
ularity of your try statement wrappings wisely. Especially in larger systems, exception 
policies should be a part of the overall design. 


Core Language Summary 


Congratulations! This concludes your look at the core Python programming language. 
If you’ve gotten this far, you may consider yourself an Official Python Programmer (and 
should feel free to add Python to your résumé the next time you dig it out). You’ve 
already seen just about everything there is to see in the language itself, and all in much 
more depth than many practicing Python programmers initially do. You’ve studied 
built-in types, statements, and exceptions, as well as tools used to build up larger pro- 
gram units (functions, modules, and classes); you’ve even explored important design 
issues, OOP, program architecture, and more. 
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The Python Toolset 


From this point forward, your future Python career will largely consist of becoming 
proficient with the toolset available for application-level Python programming. You’ll 
find this to be an ongoing task. The standard library, for example, contains hundreds 
of modules, and the public domain offers still more tools. It’s possible to spend a decade 
or more seeking proficiency with all these tools, especially as new ones are constantly 
appearing (trust me on this!). 


Speaking generally, Python provides a hierarchy of toolsets: 


Built-ins 
Built-in types like strings, lists, and dictionaries make it easy to write simple pro- 
grams fast. 


Python extensions 
For more demanding tasks, you can extend Python by writing your own functions, 
modules, and classes. 


Compiled extensions 
Although we don’t cover this topic in this book, Python can also be extended with 
modules written in an external language like C or C++. 


Because Python layers its toolsets, you can decide how deeply your programs need to 
delve into this hierarchy for any given task—you can use built-ins for simple scripts, 
add Python-coded extensions for larger systems, and code compiled extensions for 
advanced work. We’ve only covered the first two of these categories in this book, and 
that’s plenty to get you started doing substantial programming in Python. 


Table 35-1 summarizes some of the sources of built-in or existing functionality available 
to Python programmers, and some topics you’ll probably be busy exploring for the 
remainder of your Python career. Up until now, most of our examples have been very 
small and self-contained. They were written that way on purpose, to help you master 
the basics. But now that you know all about the core language, it’s time to start learning 
how to use Python’s built-in interfaces to do real work. You'll find that with a simple 
language like Python, common tasks are often much easier than you might expect. 


Table 35-1. Python’s toolbox categories 


Category Examples 

Object types Lists, dictionaries, files, strings 
Functions len, range, open 

Exceptions IndexError, KeyError 

Modules os, tkinter, pickle, re 
Attributes __dict_, name_, class _ 


Peripheral tools = NumPy, SWIG, Jython, IronPython, Django, etc. 
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Development Tools for Larger Projects 


Once you’ve mastered the basics, you'll find your Python programs becoming sub- 
stantially larger than the examples you’ve experimented with so far. For developing 
larger systems, a set of development tools is available in Python and the public domain. 
You’ve seen some of these in action, and I’ve mentioned a few others. To help you on 
your way, here is asummary of some of the most commonly used tools in this domain: 


PyDoc and docstrings 
PyDoc’s help function and HTML interfaces were introduced in Chapter 15. PyDoc 
provides a documentation system for your modules and objects and integrates with 
Python’s docstrings feature. It is a standard part of the Python system—see the 
library manual for more details. Be sure to also refer back to the documentation 
source hints listed in Chapter 4 for information on other Python information 
resources. 


PyChecker and PyLint 

Because Python is such a dynamic language, some programming errors are not 
reported until your program runs (e.g., syntax errors are caught when a file is run 
or imported). This isn’t a big drawback—as with most languages, it just means 
that you have to test your Python code before shipping it. At worst, with Python 
you essentially trade a compile phase for an initial testing phase. Furthermore, 
Python’s dynamic nature, automatic error messages, and exception model make it 
easier and quicker to find and fix errors in Python than it is in some other languages 
(unlike C, for example, Python does not crash on errors). 


The PyChecker and PyLint systems provide support for catching a large set of 
common errors ahead of time, before your script runs. They serve similar roles to 
the lint program in C development. Some Python groups run their code through 
PyChecker prior to testing or delivery, to catch any lurking potential problems. In 
fact, the Python standard library is regularly run through PyChecker before release. 
PyChecker and PyLint are third-party open source packages; you can find them at 
http://www.python.org or the PyPI website, or via your friendly neighborhood web 
search engine. 


PyUnit (a.k.a. unittest) 

In Chapter 24, we learned how to add self-test code to a Python file by using the 
__name__ == '_ main‘ trick at the bottom of the file. For more advanced testing 
purposes, Python comes with two testing support tools. The first, PyUnit (called 
unittest in the library manual), provides an object-oriented class framework for 
specifying and customizing test cases and expected results. It mimics the JUnit 
framework for Java. This is a sophisticated class-based unit testing system; see the 
Python library manual for details. 


doctest 


The doctest standard library module provides a second and simpler approach to 
regression testing, based upon Python’s docstrings feature. Roughly, to use 
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doctest, you cut and paste a log of an interactive testing session into the docstrings 
of your source files. doctest then extracts your docstrings, parses out the test cases 
and results, and reruns the tests to verify the expected results. doctest’s operation 
can be tailored in a variety of ways; see the library manual for more details. 


IDEs 

We discussed IDEs for Python in Chapter 3. IDEs such as IDLE provide a graphical 
environment for editing, running, debugging, and browsing your Python 
programs. Some advanced IDEs (such as Eclipse, Komodo, NetBeans, and Wing 
IDE) may support additional development tasks, including source control inte- 
gration, code refactoring, project management tools, and more. See Chapter 3, the 
text editors page at http://www.python.org, and your favorite web search engine for 
more on available IDEs and GUI builders for Python. 


Profilers 
Because Python is so high-level and dynamic, intuitions about performance 
gleaned from experience with other languages usually don’t apply to Python code. 
To truly isolate performance bottlenecks in your code, you need to add timing logic 
with clock tools in the time or timeit modules, or run your code under the 
profile module. We saw an example of the timing modules at work when com- 
paring iteration tools’ speeds in Chapter 20. Profiling is usually your first optimi- 
zation step—profile to isolate bottlenecks, then time alternative codings of them. 


profile is a standard library module that implements a source code profiler for 
Python; it runs a string of code you provide (e.g., a script file import, or a call toa 
function) and then, by default, prints a report to the standard output stream that 
gives performance statistics—number of calls to each function, time spent in each 
function, and more. 


The profile module can be run as a script or imported, and it may be customized 
in various ways; for example, it can save run statistics to a file to be analyzed later 
with the pstats module. To profile interactively, import the profile module and 
call profile.run('code'), passing in the code you wish to profile as a string (e.g., 
a call to a function, or an import of an entire file). To profile from a system shell 
command line, use a command of the form python -m profile main.py args... 
(see Appendix A for more on this format). Also see Python’s standard library man- 
uals for other profiling options; the cProfile module, for example, has identical 
interfaces to profile but runs with less overhead, so it may be better suited to 
profiling long-running programs. 
Debuggers 

We also discussed debugging options in Chapter 3 (see its sidebar “Debugging 
Python Code” on page 67). As a review, most development IDEs for Python support 
GUI-based debugging, and the Python standard library also includes a source code 
debugger module called pdb. This module provides a command-line interface and 
works much like common C language debuggers (e.g., dbx, gdb). 
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Much like the profiler, the pdb debugger can be run either interactively or from a 
command line and can be imported and called from a Python program. To use it 
interactively, import the module, start running code by calling a pdb function (e.g., 
pdb.run("main()")), and then type debugging commands from pdb’s interactive 
prompt. To launch pdb from a system shell command line, use a command of the 
form python -m pdb main.py args... (see Appendix A for more on this format). 
pdb also includes a useful postmortem analysis call, pdb.pm(), which starts the 
debugger after an exception has been encountered. 


Because IDEs such as IDLE also include point-and-click debugging interfaces, 
pdb isn’t a critical a tool today, except when a GUI isn’t available or when more 
control is desired. See Chapter 3 for tips on using IDLE’s debugging GUl interfaces. 
Really, neither pdb nor IDEs seem to be used much in practice—as noted in Chap- 
ter 3, most programmers either insert print statements or simply read Python’s 
error messages (not the most high-tech of approaches, but the practical tends to 
win the day in the Python world!). 
Shipping options 

In Chapter 2, we introduced common tools for packaging Python programs. 
py2exe, PyInstaller, and freeze can package byte code and the Python Virtual Ma- 
chine into “frozen binary” standalone executables, which don’t require that Python 
be installed on the target machine and fully hide your system’s code. In addition, 
we learned in Chapter 2 that Python programs may be shipped in their source 
(.py) or byte code (.pyc) forms, and that import hooks support special packaging 
techniques such as automatic extraction of .zip files and byte code encryption. 


Wealso briefly met the standard library’s distutils modules, which provide pack- 
aging options for Python modules and packages, and C-coded extensions; see the 
Python manuals for more details. The emerging Python “eggs” third-party pack- 
aging system provides another alternative that also accounts for dependencies; 
search the Web for more details. 
Optimization options 

There are a couple of options for optimizing your programs. The Psyco system 
described in Chapter 2 provides a just-in-time compiler for translating Python byte 
code to binary machine code, and Shedskin offers a Python-to-C++ translator. You 
may also occasionally see .pyo optimized byte code files, generated and run with 
the -0 Python command-line flag (discussed in Chapters 21 and 33); because this 
provides a very modest performance boost, however, it is not commonly used. 


As a last resort, you can also move parts of your program to a compiled language 
such as C to boost performance; see the book Programming Python and the Python 
standard manuals for more on C extensions. In general, Python’s speed also im- 
proves over time, so be sure to upgrade to the faster releases when possible. 
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Other hints for larger projects 
We've met a variety of language features in this text that will tend to become more 
useful once you start coding larger projects. These include module packages 
(Chapter 23), class-based exceptions (Chapter 33), class pseudoprivate attributes 
(Chapter 30), documentation strings (Chapter 15), module path configuration files 
(Chapter 21), hiding names from from * with _all__ lists and _X-style names 
(Chapter 24), adding self-test code with the _name__ == '__main__' trick (Chap- 
ter 24), using common design rules for functions and modules (Chapters 17, 19, 
and 24), using object-oriented design patterns (Chapter 30 and others), and so on. 


To learn about other large-scale Python development tools available in the public do- 
main, be sure to browse the pages at the PyPI website at http://www.python.org, and 
the Web at large. 


Chapter Summary 


This chapter wrapped up the exceptions part of the book with a survey of related state- 
ments, a look at common exception use cases, and a brief summary of commonly used 
development tools. 


This chapter also wrapped up the core material of this book. At this point, you’ve been 
exposed to the full subset of Python that most programmers use. In fact, if you have 
read this far, you should feel free to consider yourself an official Python programmer. 
Be sure to pick up a t-shirt the next time you’re online. 


The next and final part of this book is a collection of chapters dealing with topics that 
are advanced, but still in the core language category. These chapters are all optional 
reading, because not every Python programmer must delve into their subjects; indeed, 
most of you can stop here and begin exploring Python’s roles in your application do- 
mains. Frankly, application libraries tend to be more important in practice than ad- 
vanced (and to some, esoteric) language features. 


On the other hand, if you do need to care about things like Unicode or binary data, 
have to deal with API-building tools such as descriptors, decorators, and metaclasses, 
or just want to dig a bit further in general, the next part of the book will help you get 
started. The larger examples in the final part will also give you a chance to see the 
concepts you ve already learned being applied in more realistic ways. 


As this is the end of the core material of this book, you get a break on the chapter quiz 
just one question this time. As always, though, be sure to work through this part’s 
closing exercises to cement what you’ve learned in the past few chapters; because the 
next part is optional reading, this is the final end-of-part exercises session. If you want 
to see some examples of how what you ve learned comes together in real scripts drawn 
from common applications, check out the “solution” to exercise 4 in Appendix B. 
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Test Your Knowledge: Quiz 


1. (This question is a repeat from the first quiz in Chapter 1—see, I told you it would 
be easy! :-) Why does “spam” show up in so many Python examples in books and 
on the Web? 


Test Your Knowledge: Answers 


1. Because Python is named after the British comedy group Monty Python (based on 
surveys I’ve conducted in classes, this is a much-too-well-kept secret in the Python 
world!). The spam reference comes from a Monty Python skit, where a couple who 
are trying to order food in a cafeteria keep getting drowned out by a chorus of 
Vikings singing a song about spam. No, really. And if I could insert an audio clip 
of that song here, I would... 


Test Your Knowledge: Part VII Exercises 


As we’ve reached the end of this part of the book, it’s time for a few exception exercises 
to give you a chance to practice the basics. Exceptions really are simple tools; if you get 
these, you’ve probably mastered exceptions. 


See “Part VII, Exceptions and Tools” on page 1130 in Appendix B for the solutions. 


1. try/except. Write a function called oops that explicitly raises an IndexError excep- 
tion when called. Then write another function that calls oops inside a try/except 
statement to catch the error. What happens if you change oops to raise a 
KeyError instead of an IndexError? Where do the names KeyError and IndexError 
come from? (Hint: recall that all unqualified names come from one of four scopes.) 


2. Exception objects and lists. Change the oops function you just wrote to raise an 
exception you define yourself, called MyError. Identify your exception with a class. 
Then, extend the try statement in the catcher function to catch this exception and 
its instance in addition to IndexError, and print the instance you catch. 


3. Error handling. Write a function called safe(func, *args) that runs any function 
with any number of arguments by using the *name arbitrary arguments call syntax, 
catches any exception raised while the function runs, and prints the exception using 
the exc_info call in the sys module. Then use your safe function to run your 
oops function from exercise 1 or 2. Put safe in a module file called tools.py, and 
pass it the oops function interactively. What kind of error messages do you get? 
Finally, expand safe to also print a Python stack trace when an error occurs by 
calling the built-in print_exc function in the standard traceback module (see the 
Python library reference manual for details). 
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4. Self-study examples. At the end of Appendix B, I’ve included a handful of example 
scripts developed as group exercises in live Python classes for you to study and run 
on your own in conjunction with Python’s standard manual set. These are not 
described, and they use tools in the Python standard library that you’ll have to 
research on your own. Still, for many readers, it helps to see how the concepts 
we've discussed in this book come together in real programs. If these whet your 
appetite for more, you can find a wealth of larger and more realistic 
application-level Python program examples in follow-up books like Programming 
Python and on the Web. 
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PART VIII 


Advanced Topics 


CHAPTER 36 
Unicode and Byte Strings 


In the strings chapter in the core types part of this book (Chapter 7), I deliberately 
limited the scope to the subset of string topics that most Python programmers need to 
know about. Because the vast majority of programmers deal with simple forms of text 
like ASCII, they can happily work with Python’s basic str string type and its associated 
operations and don’t need to come to grips with more advanced string concepts. In 
fact, such programmers can largely ignore the string changes in Python 3.0 and continue 
to use strings as they may have in the past. 


On the other hand, some programmers deal with more specialized types of data: non- 
ASCII character sets, image file contents, and so on. For those programmers (and others 
who may join them some day), in this chapter we’ re going to fill in the rest of the Python 
string story and look at some more advanced concepts in Python’s string model. 


Specifically, we’ll explore the basics of Python’s support for Unicode text— 
wide-character strings used in internationalized applications—as well as binary data— 
strings that represent absolute byte values. As we'll see, the advanced string 
representation story has diverged in recent versions of Python: 


e Python 3.0 provides an alternative string type for binary data and supports Unicode 
text in its normal string type (ASCII is treated as a simple type of Unicode). 


e Python 2.6 provides an alternative string type for non-ASCII Unicode text and 
supports both simple text and binary data in its normal string type. 


In addition, because Python’s string model has a direct impact on how you process 
non-ASCII files, we'll explore the fundamentals of that related topic here as well. Fi- 
nally, we'll take a brief look at some advanced string and binary tools, such as pattern 
matching, object pickling, binary data packing, and XML parsing, and the ways in 
which they are impacted by 3.0’s string changes. 


This is officially an advanced topics chapter, because not all programmers will need to 
delve into the worlds of Unicode encodings or binary data. If you ever need to care 
about processing either of these, though, you’ll find that Python’s string models provide 
the support you need. 
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String Changes in 3.0 


One of the most noticeable changes in 3.0 is the mutation of string object types. In a 
nutshell, 2.X’s str and unicode types have morphed into 3.0’s str and bytes types, and 
a new mutable bytearray type has been added. The bytearray type is technically avail- 
able in Python 2.6 too (though not earlier), but it’s a back-port from 3.0 and does not 
as clearly distinguish between text and binary content in 2.6. 


Especially if you process data that is either Unicode or binary in nature, these changes 
can have substantial impacts on your code. In fact, as a general rule of thumb, how 
much you need to care about this topic depends in large part upon which of the fol- 
lowing categories you fall into: 


e Ifyou deal with non-ASCII Unicode text—for instance, in the context of interna- 
tionalized applications and the results of some XML parsers—you will find support 
for text encodings to be different in 3.0, but also probably more direct, accessible, 
and seamless than in 2.6. 


e If you deal with binary data—for example, in the form of image or audio files or 
packed data processed with the struct module—you will need to understand 3.0’s 
new bytes object and 3.0’s different and sharper distinction between text and bi- 
nary data and files. 


e Ifyou fall into neither of the prior two categories, you can generally use strings in 
3.0 much as you would in 2.6: with the general str string type, text files, and all 
the familiar string operations we studied earlier. Your strings will be encoded and 
decoded using your platform’s default encoding (e.g., ASCII, or UTF-8 on Win- 
dows in the U.S.—sys.getdefaultencoding() gives your default if you care to 
check), but you probably won’t notice. 


In other words, if your text is always ASCII, you can get by with normal string objects 
and text files and can avoid most of the following story. As we’ll see in a moment, ASCII 
is a simple kind of Unicode and a subset of other encodings, so string operations and 
files “just work” if your programs process ASCII text. 


Even if you fall into the last of the three categories just mentioned, though, a basic 
understanding of 3.0’s string model can help both to demystify some of the underlying 
behavior now, and to make mastering Unicode or binary data issues easier if they impact 
you in the future. 


Python 3.0’s support for Unicode and binary data is also available in 2.6, albeit in 
different forms. Although our main focus in this chapter is on string types in 3.0, we’ll 
explore some 2.6 differences along the way too. Regardless of which version you use, 
the tools we’ll explore here can become important in many types of programs. 
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String Basics 


Before we look at any code, let’s begin with a general overview of Python’s string model. 
To understand why 3.0 changed the way it did on this front, we have to start with a 
brief look at how characters are actually represented in computers. 


Character Encoding Schemes 


Most programmers think of strings as series of characters used to represent textual data. 
The way characters are stored in a computer’s memory can vary, though, depending 
on what sort of character set must be recorded. 


The ASCII standard was created in the U.S., and it defines many U.S. programmers’ 
notion of text strings. ASCII defines character codes from 0 through 127 and allows 
each character to be stored in one 8-bit byte (only 7 bits of which are actually used). 
For example, the ASCII standard maps the character 'a' to the integer value 97 (0x61 
in hex), which is stored in a single byte in memory and files. If you wish to see how this 
works, Python’s ord built-in function gives the binary value for a character, and chr 
returns the character for a given integer code value: 


>>> ord('a') # 'a' is a byte with binary value 97 in ASCII 
97 

>>> hex(97) 

"0x61" 

>>> chr(97) # Binary value 97 stands for character 'a' 


a 


Sometimes one byte per character isn’t enough, though. Various symbols and accented 
characters, for instance, do not fit into the range of possible characters defined by 
ASCII. To accommodate special characters, some standards allow all possible values 
in an 8-bit byte, 0 through 255, to represent characters, and assign the values 128 
through 255 (outside ASCII’s range) to special characters. One such standard, known 
as Latin-1, is widely used in Western Europe. In Latin-1, character codes above 127 
are assigned to accented and otherwise special characters. The character assigned to 
byte value 196, for example, is a specially marked non-ASCII character: 

>>> OxC4 

196 

>>> chr(196) 

'Ä' 
This standard allows for a wide array of extra special characters. Still, some alphabets 
define so many characters that it is impossible to represent each of them as one byte. 
Unicode allows more flexibility. Unicode text is commonly referred to as 
“wide-character” strings, because each character may be represented with multiple 
bytes. Unicode is typically used in internationalized programs, to represent European 
and Asian character sets that have more characters than 8-bit bytes can represent. 
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To store such rich text in computer memory, we say that characters are translated to 
and from raw bytes using an encoding—the rules for translating a string of Unicode 
characters into a sequence of bytes, and extracting a string from a sequence of bytes. 
More procedurally, this translation back and forth between bytes and strings is defined 
by two terms: 


e Encoding is the process of translating a string of characters into its raw bytes form, 
according to a desired encoding name. 


e Decoding is the process of translating a raw string of bytes into is character string 
form, according to its encoding name. 


That is, we encode from string to raw bytes, and decode from raw bytes to string. For 
some encodings, the translation process is trivial—ASCII and Latin-1, for instance, 
map each character to a single byte, so no translation work is required. For other en- 
codings, the mapping can be more complex and yield multiple bytes per character. 


The widely used UTF-8 encoding, for example, allows a wide range of characters to be 
represented by employing a variable number of bytes scheme. Character codes less than 
128 are represented as a single byte; codes between 128 and 0x7ff (2047) are turned 
into two bytes, where each byte has a value between 128 and 255; and codes above 
Ox7ff are turned into three- or four-byte sequences having values between 128 and 255. 
This keeps simple ASCII strings compact, sidesteps byte ordering issues, and avoids 
null (zero) bytes that can cause problems for C libraries and networking. 


Because encodings’ character maps assign characters to the same codes for compati- 
bility, ASCII is a subset of both Latin-1 and UTF-8; that is, a valid ASCII character string 
is also a valid Latin-1- and UTF-8-encoded string. This is also true when the data is 
stored in files: every ASCII file is a valid UTF-8 file, because ASCII is a 7-bit subset of 
UTF-8. 


Conversely, the UTF-8 encoding is binary compatible with ASCII for all character codes 
less than 128. Latin-1 and UTF-8 simply allow for additional characters: Latin-1 for 
characters mapped to values 128 through 255 within a byte, and UTF-8 for characters 
that may be represented with multiple bytes. Other encodings allow wider character 
sets in similar ways, but all of these—ASCII, Latin-1, UTF-8, and many others—are 
considered to be Unicode. 


To Python programmers, encodings are specified as strings containing the encoding’s 
name. Python comes with roughly 100 different encodings; see the Python library 
reference for a complete list. Importing the module encodings and running 
help(encodings) shows you many encoding names as well; some are implemented in 
Python, and some in C. Some encodings have multiple names, too; for example, latin-1, 
iso_8859_1, and 8859 are all synonyms for the same encoding, Latin-1. We’ll revisit 
encodings later in this chapter, when we study techniques for writing Unicode strings 
in a script. 
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For more on the Unicode story, see the Python standard manual set. It includes a 
“Unicode HOWTO” in its “Python HOWTOs” section, which provides additional 
background that we will skip here in the interest of space. 


Python’s String Types 


At a more concrete level, the Python language provides string data types to represent 
character text in your scripts. The string types you will use in your scripts depend upon 
the version of Python you’re using. Python 2.X has a general string type for representing 
binary data and simple 8-bit text like ASCII, along with a specific type for representing 
multibyte Unicode text: 


e str for representing 8-bit text and binary data 


e unicode for representing wide-character Unicode text 


Python 2.X’s two string types are different (unicode allows for the extra size of characters 
and has extra support for encoding and decoding), but their operation sets largely 
overlap. The str string type in 2.X is used for text that can be represented with 8-bit 
bytes, as well as binary data that represents absolute byte values. 


By contrast, Python 3.X comes with three string object types—one for textual data and 
two for binary data: 


e str for representing Unicode text (both 8-bit and wider) 
e bytes for representing binary data 


e bytearray, a mutable flavor of the bytes type 


As mentioned earlier, bytearray is also available in Python 2.6, but it’s simply a back- 
port from 3.0 with less content-specific behavior and is generally considered a 3.0 type. 


All three string types in 3.0 support similar operation sets, but they have different roles. 
The main goal behind this change in 3.X was to merge the normal and Unicode string 
types of 2.X into a single string type that supports both normal and Unicode text: 
developers wanted to remove the 2.X string dichotomy and make Unicode processing 
more natural. Given that ASCII and other 8-bit text is really a simple kind of Unicode, 
this convergence seems logically sound. 


To achieve this, the 3.0 str type is defined as an immutable sequence of characters (not 
necessarily bytes), which may be either normal text such as ASCII with one byte per 
character, or richer character set text such as UTF-8 Unicode that may include multi- 
byte characters. Strings processed by your script with this type are encoded per the 
platform default, but explicit encoding names may be provided to translate str objects 
to and from different schemes, both in memory and when transferring to and from files. 


While 3.0’s new str type does achieve the desired string/unicode merging, many pro- 
grams still need to process raw binary data that is not encoded per any text format. 
Image and audio files, as well as packed data used to interface with devices or C 
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programs you might process with Python’s struct module, fall into this category. To 
support processing of truly binary data, therefore, a new type, bytes, also was 
introduced. 


In 2.X, the general str type filled this binary data role, because strings were just se- 
quences of bytes (the separate unicode type handles wide-character strings). In 3.0, the 
bytes type is defined as an immutable sequence of 8-bit integers representing absolute 
byte values. Moreover, the 3.0 bytes type supports almost all the same operations that 
the str type does; this includes string methods, sequence operations, and even re mod- 
ule pattern matching, but not string formatting. 


A 3.0 bytes object really is a sequence of small integers, each of which is in the range 
0 through 255; indexing a bytes returns an int, slicing one returns another bytes, and 
running the list built-in on one returns a list of integers, not characters. When pro- 
cessed with operations that assume characters, though, the contents of bytes objects 
are assumed to be ASCIl-encoded bytes (e.g., the isalpha method assumes each byte 
is an ASCII character code). Further, bytes objects are printed as character strings in- 
stead of integers for convenience. 


While they were at it, Python developers also added a bytearray type in 3.0. 
bytearray is a variant of bytes that is mutable and so supports in-place changes. It 
supports the usual string operations that str and bytes do, as well as many of the same 
in-place change operations as lists (e.g., the append and extend methods, and assignment 
to indexes). Assuming your strings can be treated as raw bytes, bytearray finally adds 
direct in-place mutability for string data—something not possible without conversion 
to a mutable type in Python 2, and not supported by Python 3.0’s str or bytes. 


Although Python 2.6 and 3.0 offer much the same functionality, they package it dif- 
ferently. In fact, the mapping from 2.6 to 3.0 string types is not direct—2.6’s str equates 
to both str and bytes in 3.0, and 3.0’s str equates to both str and unicode in 2.6. 
Moreover, the mutability of 3.0’s bytearray is unique. 


In practice, though, this asymmetry is not as daunting as it might sound. It boils down 
to the following: in 2.6, you will use str for simple text and binary data and unicode 
for more advanced forms of text; in 3.0, you’ll use str for any kind of text (simple and 
Unicode) and bytes or bytearray for binary data. In practice, the choice is often made 
for you by the tools you use—especially in the case of file processing tools, the topic 
of the next section. 


Text and Binary Files 


File I/O (input and output) has also been revamped in 3.0 to reflect the str/bytes 
distinction and automatically support encoding Unicode text. Python now makes a 
sharp platform-independent distinction between text files and binary files: 
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Text files 

When a file is opened in text mode, reading its data automatically decodes its con- 
tent (per a platform default or a provided encoding name) and returns it as a str; 
writing takes a str and automatically encodes it before transferring it to the file. 
Text-mode files also support universal end-of-line translation and additional en- 
coding specification arguments. Depending on the encoding name, text files may 
also automatically process the byte order mark sequence at the start of a file (more 
on this momentarily). 


Binary files 
When a file is opened in binary mode by adding a b (lowercase only) to the mode 
string argument in the built-in open call, reading its data does not decode it in any 
way but simply returns its content raw and unchanged, as a bytes object; writing 
similarly takes a bytes object and transfers it to the file unchanged. Binary-mode 
files also accept a bytearray object for the content to be written to the file. 


Because the language sharply differentiates between str and bytes, you must decide 
whether your data is text or binary in nature and use either str or bytes objects to 
represent its content in your script, as appropriate. Ultimately, the mode in which you 
open a file will dictate which type of object your script will use to represent its content: 


e If you are processing image files, packed data created by other programs whose 
content you must extract, or some device data streams, chances are good that you 
will want to deal with it using bytes and binary-mode files. You might also opt for 
bytearray if you wish to update the data without making copies of it in memory. 


e Ifinstead you are processing something that is textual in nature, such as program 
output, HTML, internationalized text, or CSV or XML files, you'll probably want 
to use str and text-mode files. 


Notice that the mode string argument to built-in function open (its second argument) 
becomes fairly crucial in Python 3.0—its content not only specifies a file processing 
mode, but also implies a Python object type. By adding a b to the mode string, you specify 
binary mode and will receive, or must provide, a bytes object to represent the file’s 
content when reading or writing. Without the b, your file is processed in text mode, 
and you'll use str objects to represent its content in your script. For example, the modes 
rb, wb, and rb+ imply bytes; r, w+, and rt (the default) imply str. 


Text-mode files also handle the byte order marker (BOM) sequence that may appear at 
the start of files under certain encoding schemes. In the UTF-16 and UTF-32 encodings, 
for example, the BOM specifies big- or little-endian format (essentially, which end of 
a bitstring is most significant). A UTF-8 text file may also include a BOM to declare 
that it is UTF-8 in general, but this isn’t guaranteed. When reading and writing data 
using these encoding schemes, Python automatically skips or writes the BOM if it is 
implied by a general encoding name or if you provide a more specific encoding name 
to force the issue. For example, the BOM is always processed for “utf-16,” the more 
specific encoding name “utf-16-le” species little-endian UTF-16 format, and the more 
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specific encoding name “utf-8-sig” forces Python to both skip and write a BOM on 
input and output, respectively, for UTF-8 text (the general name “utf-8” does not). 


We'll learn more about BOMs and files in general in the section “Handling the BOM 
in 3.0” on page 926. First, let’s explore the implications of Python’s new Unicode 
string model. 


Python 3.0 Strings in Action 


Let’s step through a few examples that demonstrate how the 3.0 string types are used. 
One note up front: the code in this section was run with and applies to 3.0 only. Still, 
basic string operations are generally portable across Python versions. Simple ASCII 
strings represented with the str type work the same in 2.6 and 3.0 (and exactly as we 
saw in Chapter 7 of this book). Moreover, although there is no bytes type in Python 
2.6 (it has just the general str), it can usually run code that thinks there is—in 2.6, the 
call bytes(X) is present as a synonym for str(X), and the new literal form b'...' is taken 
to be the same as the normal string literal '...'. You may still run into version skew in 
some isolated cases, though; the 2.6 bytes call, for instance, does not allow the second 
argument (encoding name) required by 3.0’s bytes. 


Literals and Basic Properties 


Python 3.0 string objects originate when you call a built-in function such as str or 
bytes, process a file created by calling open (described in the next section), or code literal 
syntax in your script. For the latter, a new literal form, b'xxx' (and equivalently, 
B'xxx') is used to create bytes objects in 3.0, and bytearray objects may be created by 
calling the bytearray function, with a variety of possible arguments. 


More formally, in 3.0 all the current string literal forms—' xxx", "xxx", and triple-quo- 
ted blocks—generate a str; adding a b or B just before any of them creates a bytes 
instead. This new b'...' bytes literal is similar in form to the r'...' raw string used to 
suppresses backslash escapes. Consider the following, run in 3.0: 


C:\misc> c:\python30\python 


>>> B = b'spam' # Make a bytes object (8-bit bytes) 
>>> S = ‘eggs’ # Make a str object (Unicode characters, 8-bit or wider) 


>>> type(B), type(S) 
(<class 'bytes'>, <class 'str'>) 


>>> B # Prints as a character string, really sequence of ints 
b'spam' 

>>> S 

'eggs' 
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The bytes object is actually a sequence of short integers, though it prints its content as 
characters whenever possible: 


>>> B[o], S[o] # Indexing returns an int for bytes, str for str 
(115, 'e') 
>>> B[1:], S[1:] # Slicing makes another bytes or str object 


(b'pam', 'ggs') 


>>> list(B), list(S) 
([115, 112, 97, 109], ['e', 'g', ‘g', 's']) # bytes is really ints 


The bytes object is immutable, just like str (though bytearray, described later, is not); 
you cannot assign a str, bytes, or integer to an offset of a bytes object. The bytes prefix 
also works for any string literal form: 

>>> B[0] = 'x' # Both are immutable 

TypeError: 'bytes' object does not support item assignment 


>>> S[o] = 'x' 
TypeError: 'str' object does not support item assignment 


>>> B = B""" # bytes prefix works on single, double, triple quotes 
© XXXX 


<.. Yyyy 


>>> B 

b'\nxxxx\nyyyy\n' 
As mentioned earlier, in Python 2.6 the b' xxx’ literal is present for compatibility but is 
the same as 'xxx' and makes a str, and bytes is just a synonym for str; as you’ve seen, 
in 3.0 both of these address the distinct bytes type. Also note that the u'xxx' and 
U'xxx' Unicode string literal forms in 2.6 are gone in 3.0; use 'xxx' instead, since all 
strings are Unicode, even if they contain all ASCII characters (more on writing non- 
ASCII Unicode text in the section “Coding Non-ASCII Text” on page 905). 


Conversions 


Although Python 2.X allowed str and unicode type objects to be mixed freely (if the 
strings contained only 7-bit ASCII text), 3.0 draws a much sharper distinction—str 
and bytes type objects never mix automatically in expressions and never are converted 
to one another automatically when passed to functions. A function that expects an 
argument to be a str object won’t generally accept a bytes, and vice versa. 


Because of this, Python 3.0 basically requires that you commit to one type or the other, 
or perform manual, explicit conversions: 
e str.encode() and bytes(S, encoding) translate a string to its raw bytes form and 
create a bytes from a str in the process. 


e bytes.decode() and str(B, encoding) translate raw bytes into its string form and 
create a str from a bytes in the process. 
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These encode and decode methods (as well as file objects, described in the next section) 
use either a default encoding for your platform or an explicitly passed-in encoding 
name. For example, in 3.0: 


>>> S = ‘eggs’ 

>>> S.encode() # str to bytes: encode text into raw bytes 
b'eggs' 

>>> bytes(S, encoding='ascii') # str to bytes, alternative 

b'eggs' 

>>> B = b'spam' 

>>> B.decode() # bytes to str: decode raw bytes into text 
"spam' 

>>> str(B, encoding='ascii') # bytes to str, alternative 

"spam' 


Two cautions here. First of all, your platform’s default encoding is available in the 
sys module, but the encoding argument to bytes is not optional, even though it is in 
str.encode (and bytes.decode). 


Second, although calls to str do not require the encoding argument like bytes does, 
leaving it off in str calls does not mean it defaults—instead, a str call without an 
encoding returns the bytes object’s print string, not its str converted form (this is 
usually not what you'll want!). Assuming B and S are still as in the prior listing: 


>>> import sys 


>>> sys. platform # Underlying platform 
'win32' 

>>> sys.getdefaultencoding() # Default encoding for str here 
'utf-8' 


>>> bytes(S) 
TypeError: string argument without an encoding 


>>> str(B) # str without encoding 
"b'spam'" # A print string, not conversion! 
>>> len(str(B)) 

7 

>>> len(str(B, encoding='ascii')) # Use encoding to convert to str 
4 


Coding Unicode Strings 


Encoding and decoding become more meaningful when you start dealing with actual 
non-ASCII Unicode text. To code arbitrary Unicode characters in your strings, some 
of which you might not even be able to type on your keyboard, Python string literals 
support both "\xNN" hex byte value escapes and "\uNNNN" and "\UNNNNNNNN" Unicode 
escapes in string literals. In Unicode escapes, the first form gives four hex digits to 
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encode a 2-byte (16-bit) character code, and the second gives eight hex digits for a 
4-byte (32-bit) code. 


Coding ASCII Text 


Let’s step through some examples that demonstrate text coding basics. As we’ve seen, 
ASCII text is a simple type of Unicode, stored as a sequence of byte values that represent 
characters: 


C:\misc> c:\python30\python 


>>> ord('X') # 'X' has binary value 88 in the default encoding 
88 

>>> chr(88) # 88 stands for character 'X' 

"y! 

>>> S = 'XYZ' # A Unicode string of ASCII text 

>>> S 

'XYZ' 

>>> len(S) # 3 characters long 

3 


>>> [ord(c) for c in S] #3 bytes with integer ordinal values 
[88, 89, 90] 


Normal 7-bit ASCII text like this is represented with one character per byte under each 
of the Unicode encoding schemes described earlier in this chapter: 


>>> S.encode('ascii') # Values 0..127 in 1 byte (7 bits) each 


b'XYZ' 

>>> S.encode('latin-1') # Values 0..255 in 1 byte (8 bits) each 

b'XYZ' 

>>> S.encode('utf-8') # Values 0..127 in 1 byte, 128..2047 in 2, others 3 or 4 
b'XYZ' 


In fact, the bytes objects returned by encoding ASCII text this way is really a sequence 
of short integers, which just happen to print as ASCII characters when possible: 

>>> S.encode('latin-1')[0] 

88 


>>> list(S.encode('latin-1')) 
[88, 89, 90] 


Coding Non-ASCIl Text 


To code non-ASCII characters, you may use hex or Unicode escapes in your strings; 
hex escapes are limited to a single byte’s value, but Unicode escapes can name char- 
acters with values two and four bytes wide. The hex values oxCD and oxE8, for instance, 
are codes for two special accented characters outside the 7-bit range of ASCII, but we 
can embed them in 3.0 str objects because str supports Unicode today: 
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>>> chr(oxc4) # 0xC4, OxE8: characters outside ASCII's range 
'Ä' 
>>> chr(oxe8) 


e 


>>> S = '\xc4\xe8' # Single byte 8-bit hex escapes 
>>> S 
"Ka! 


>>> S = '\u00c4\u00e8' = # 16-bit Unicode escapes 

>>> S 

"Ka! 

>>> len(S) # 2 characters long (not number of bytes!) 
2 


Encoding and Decoding Non-ASCIl text 


Now, if we try to encode a non-ASCII string into raw bytes using as ASCII, we’ll get an 
error. Encoding as Latin-1 works, though, and allocates one byte per character; en- 
coding as UTF-8 allocates 2 bytes per character instead. If you write this string to a file, 
the raw bytes shown here is what is actually stored on the file for the encoding types 
given: 

>>> S = '\u00c4\u00e8' 

>>> S 

"ha! 

>>> len(S) 

2 


>>> S.encode('ascii') 
UnicodeEncodeError: ‘ascii’ codec can't encode characters in position 0-1: 
ordinal not in range(128) 


>>> S.encode('latin-1') # One byte per character 
b'\xc4\xe8' 
>>> S.encode('utf-8') # Two bytes per character 


b'\xc3\x84\xc3\xa8' 


>>> len(S.encode('latin-1')) # 2 bytes in latin-1, 4 in utf-8 
2 

>>> len(S.encode('utf-8')) 

4 


Note that you can also go the other way, reading raw bytes from a file and decoding 
them back to a Unicode string. However, as we’ll see later, the encoding mode you give 
to the open call causes this decoding to be done for you automatically on input (and 
avoids issues that may arise from reading partial character sequences when reading by 
blocks of bytes): 

>>> B = b'\xc4\xe8' 


>>> B 
b'\xc4\xe8' 
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>>> len(B) # 2 raw bytes, 2 characters 
2 

>>> B.decode('latin-1') # Decode to latin-1 text 
"Ka! 


>>> B = b'\xc3\x84\xc3\xa8' 


>>> len(B) # 4 raw bytes 

4 

>>> B.decode('utf-8') 

"Ka! 

>>> len(B.decode('utf-8')) # 2 Unicode characters 
2 


Other Unicode Coding Techniques 


Some encodings use even larger byte sequences to represent characters. When needed, 
you can specify both 16- and 32-bit Unicode values for characters in your strings—use 


"\u..." with four hex digits for the former, and "\U...." with eight hex digits for the 
latter: 

>>> S = ‘A\u00c4B\U000000e8C ' 

>> S # A, B, C, and 2 non-ASCII characters 

"AABAC' 

>>> len(S) # 5 characters long 

5 


>>> S.encode('latin-1') 

b'A\xc4B\xe8C' 

>>> len(S.encode('latin-1')) # 5 bytes in latin-1 
5 


>>> S.encode('utf-8') 

b'A\xc3\x84B\xc3\xa8C' 

>>> len(S.encode('utf-8')) # 7 bytes in utf-8 
7 


Interestingly, some other encodings may use very different byte formats. The cp500 
EBCDIC encoding, for example, doesn’t even encode ASCII the same way as the en- 
codings we’ve been using so far (since Python encodes and decodes for us, we only 


generally need to care about this when providing encoding names): 


>>> S 

' AÄBÈèC' 

>>> S.encode('cp500') # Two other Western European encodings 
b'\xc1c\xc2T\xc3' 

>>> S.encode('cp850') # 5 bytes each 

b'A\x8eB\x8aC' 


>>> S = ‘spam’ # ASCII text is the same in most 
>>> S.encode('latin-1') 

b' spam" 

>>> S.encode('utf-8') 

b' spam" 

>>> S.encode('cp500' ) # But not in cp500: IBM EBCDIC! 
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b'\xa2\x97\x81\x94' 

>>> S.encode('cp850' ) 

b' spam" 
Technically speaking, you can also build Unicode strings piecemeal using chr instead 
of Unicode or hex escapes, but this might become tedious for large strings: 

>>> S = 'A' + chr(0xC4) + 'B' + chr(OxE8) + 'C' 

>>> S 

' AÄBÈèC' 
Two cautions here. First, Python 3.0 allows special characters to be coded with both 
hex and Unicode escapes in str strings, but only with hex escapes in bytes strings— 
Unicode escape sequences are silently taken verbatim in bytes literals, not as escapes. 
In fact, bytes must be decoded to str strings to print their non-ASCII characters 
properly: 

>>> S = 'A\xC4B\xE8C' # str recognizes hex and Unicode escapes 


>> S 
' AÄBÈC' 


>>> S = 'A\u00C4B\U000000E8C' 
>> S 
' AÄBÈèC' 


>>> B = b'A\xC4B\xE8C' # bytes recognizes hex but not Unicode 
>>> B 
b'A\xc4B\xe8C' 


>>> B = b'A\u00C4B\U000000E8C ' # Escape sequences taken literally! 
>>> B 
b'A\\u00C4B\\U000000E 8C' 


>>> B = b'A\xC4B\xE8C' # Use hex escapes for bytes 

>>> B # Prints non-ASCII as hex 

b'A\xc4B\xe8C' 

>>> print(B) 

b'A\xc4B\xe8C' 

>>> B.decode('latin-1') # Decode as latin-1 to interpret as text 

' AÄBÈèC' 
Second, bytes literals require characters either to be either ASCII characters or, if their 
values are greater than 127, to be escaped; str stings, on the other hand, allow literals 
containing any character in the source character set (which, as discussed later, defaults 
to UTF-8 unless an encoding declaration is given in the source file): 

>>> S = 'AABAC' # Chars from UTF-8 if no encoding declaration 

>> S 

' AÄBÈèC' 

>>> B = b'AABeC' 

SyntaxError: bytes can only contain ASCII literal characters. 


>>> B = b'A\xC4B\xE8C' # Chars must be ASCII, or escapes 
>>> B 
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b'A\xc4B\xe8C' 
>>> B.decode('latin-1') 


' AÄBÈèC' 
>>> S.encode() # Source code encoded per UTF-8 by default 
b' A\xc3\x84B\xc3\xa8C' # Uses system default to encode, unless passed 


>>> S.encode('utf-8') 
b' A\xc3\x84B\xc3\xa8C' 


>>> B.decode() # Raw bytes do not correspond to utf-8 
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-2: ... 


Converting Encodings 


So far, we’ve been encoding and decoding strings to inspect their structure. More gen- 
erally, we can always convert a string to a different encoding than the source character 
set default, but we must provide an explicit encoding name to encode to and decode 
from: 


>>> S = 'AÄBÈC' 


>> S 

' AÄBÈèC' 

>>> S.encode() # Default utf-8 encoding 
b' A\xc3\x84B\xc3\xa8C' 

>>> T = S.encode('cp500') # Convert to EBCDIC 
>> T 


b'\xc1c\xc2T\xc3' 


>>> U = T.decode('cp500') # Convert back to Unicode 
>>> U 

' AÄBÈèC' 

>>> U.encode() # Default utf-8 encoding again 


b' A\xc3\x84B\xc3\xa8C' 


Keep in mind that the special Unicode and hex character escapes are only necessary 
when you code non-ASCII Unicode strings manually. In practice, you’ll often load such 
text from files instead. As we’ll see later in this chapter, 3.0’s file object (created with 
the open built-in function) automatically decodes text strings as they are read and 
encodes them when they are written; because of this, your script can often deal with 
strings generically, without having to code special characters directly. 


Later in this chapter we’ll also see that it’s possible to convert between encodings when 
transferring strings to and from files, using a technique very similar to that in the last 
example; although you'll still need to provide explicit encoding names when opening 
a file, the file interface does most of the conversion work for you automatically. 
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Coding Unicode Strings in Python 2.6 


Now that I’ve shown you the basics of Unicode strings in 3.0, I need to explain that 
you can do much the same in 2.6, though the tools differ. unicode is available in Python 
2.6, but it is a distinct data type from str, and it allows free mixing of normal and 
Unicode strings when they are compatible. In fact, you can essentially pretend 2.6’s 
str is 3.0’s bytes when it comes to decoding raw bytes into a Unicode string, as long 
as it’s in the proper form. Here is 2.6 in action (all other sections in this chapter are run 
under 3.0): 

C:\misc> c:\python26\python 

>>> import sys 

>>> sys.version 

"2.6 (126:66721, Oct 2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)]' 


>>> S = 'A\xC4B\xE8C' # String of 8-bit bytes 

>>> print S # Some are non-ASCII 

AABeC 

>>> S.decode('latin-1') # Decode byte to latin-1 Unicode 


u'A\xc4B\xe8C' 


>>> S.decode('utf-8') # Not formatted as utf-8 
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-2: invalid data 


>>> S.decode('ascii') # Outside ASCII range 
UnicodeDecodeError: ‘ascii’ codec can't decode byte Oxc4 in position 1: ordinal 
not in range(128) 


To store arbitrarily encoded Unicode text, make a unicode object with the u'xxx' literal 
form (this literal is no longer available in 3.0, since all strings support Unicode in 3.0): 
>>> U = u'A\xC4B\xE8Cc' # Make Unicode string, hex escapes 
>>> U 
u'A\xc4B\xe8C' 
>>> print U 
AAB&C 
Once you’ve created it, you can convert Unicode text to different raw byte encodings, 
similar to encoding str objects into bytes objects in 3.0: 


>>> U.encode('latin-1') # Encode per latin-1: 8-bit bytes 
"A\xc4B\xe8C' 

>>> U.encode('utf-8') # Encode per utf-8: multibyte 
"A\xc3\x84B\xc3\xa8C' 


Non-ASCII characters can be coded with hex or Unicode escapes in string literals in 
2.6, just as in 3.0. However, as with bytes in 3.0, the "\u..." and "\U..." escapes are 
recognized only for unicode strings in 2.6, not 8-bit str strings: 

C:\misc> c:\python26\python 

>>> U = u'A\xC4B\xE8C' # Hex escapes for non-ASCII 


>>> U 
u'A\xc4B\xe8C' 


910 | Chapter 36: Unicode and Byte Strings 


>>> print U 
AAB&C 


>>> U = u'A\u00C4B\U000000E8C' # Unicode escapes for non-ASCII 
>>> U # u" = 16 bits, U" = 32 bits 
u'A\xc4B\xe8C' 

>>> print U 

AABeC 


>>> S = 'A\xC4B\xE8Cc' # Hex escapes work 

>>> S 

"A\xc4B\xe8C' 

>>> print S # But some print oddly, unless decoded 
A-BFC 

>>> print S.decode('latin-1') 

AABeC 


>>> S = 'A\u00C4B\U000000E8C' # Not Unicode escapes: taken literally! 
>> S 

"A\\u00C4B\\UOO0000E8C' 

>>> print S 

A\u00C4B\U000000E8C 

>>> len(S) 

19 


Like 3.0’s str and bytes, 2.6’s unicode and str share nearly identical operation sets, so 
unless you need to convert to other encodings you can often treat unicode as though it 
were str. One of the primary differences between 2.6 and 3.0, though, is that 
unicode and non-Unicode str objects can be freely mixed in expressions, and as long 
as the str is compatible with the unicode’s encoding Python will automatically convert 
it up to unicode (in 3.0, str and bytes never mix automatically and require manual 
conversions): 


>>> u'ab' + 'cd' # Can mix if compatible in 2.6 
u'abcd' # 'ab' + b'cd' not allowed in 3.0 


In fact, the difference in types is often trivial to your code in 2.6. Like normal strings, 
Unicode strings may be concatenated, indexed, sliced, matched with the re module, 
and so on, and they cannot be changed in-place. If you ever need to convert between 
the two types explicitly, you can use the built-in str and unicode functions: 


>>> str(u'spam') # Unicode to normal 
"spam' 

>>> unicode('spam') # Normal to Unicode 
u'spam' 


However, this liberal approach to mixing string types in 2.6 only works if the string is 
compatible with the unicode object’s encoding type: 


>>> S = 'A\xC4B\xE8c' # Can't mix if incompatible 
>>> U = u'A\xC4B\xE8C' 
>> S+U 


UnicodeDecodeError: ‘ascii’ codec can't decode byte Ooxc4 in position 1: ordinal 
not in range(128) 
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>>> S.decode('latin-1') + U # Manual conversion still required 
u'A\xc4B\xe8CA\xc4B\xe8C' 


>>> print S.decode('latin-1') + U 

AÄBÈCAÄBÈC 
Finally, as we’ll see in more detail later in this chapter, 2.6’s open call supports only files 
of 8-bit bytes, returning their contents as str strings; it’s up to you to interpret the 
contents as text or binary data and decode if needed. To read and write Unicode files 
and encode or decode their content automatically, use 2.6’s codecs .open call, docu- 
mented in the 2.6 library manual. This call provides much the same functionality as 
3.0’s open and uses 2.6 unicode objects to represent file content—reading a file translates 
encoded bytes into decoded Unicode characters, and writing translates strings to the 
desired encoding specified when the file is opened. 


Source File Character Set Encoding Declarations 


Unicode escape codes are fine for the occasional Unicode character in string literals, 
but they can become tedious if you need to embed non-ASCII text in your strings 
frequently. For strings you code within your script files, Python uses the UTF-8 en- 
coding by default, but it allows you to change this to support arbitrary character sets 
by including a comment that names your desired encoding. The comment must be of 
this form and must appear as either the first or second line in your script in either Python 
2.6 or 3.0: 


# -*- coding: latin-1 -*- 
When a comment of this form is present, Python will recognize strings represented 
natively in the given encoding. This means you can edit your script file in a text editor 
that accepts and displays accented and other non-ASCII characters correctly, and Py- 
thon will decode them correctly in your string literals. For example, notice how the 


comment at the top of the following file, text.py, allows Latin-1 characters to be em- 
bedded in strings: 


# -*- coding: latin-1 -*- 

# Any of the following string literal forms work in latin-1. 

# Changing the encoding above to either ascii or utf-8 fails, 

# because the 0xc4 and Oxe8 in myStr1 are not valid in either. 
myStr1 = 'aÄBèC' 

myStr2 = 'A\u00c4B\U000000e8C ' 

myStr3 = 'A' + chr(OxC4) + 'B' + chr(OxE8) + 'C' 


import sys 
print('Default encoding:', sys.getdefaultencoding()) 


for aStr in myStri, myStr2, myStr3: 
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print('{o}, strlen={1}, '.format(aStr, len(aStr)), end='') 


bytes1 = aStr.encode() # Per default utf-8: 2 bytes for non-ASCII 
bytes2 = aStr.encode('latin-1') # One byte per char 
#bytes3 = aStr.encode('ascii') # ASCII fails: outside 0..127 range 


print('bytesleni={0}, byteslen2={1}'.format(len(bytes1), len(bytes2))) 


When run, this script produces the following output: 


C:\misc> c:\python30\python text.py 
Default encoding: utf-8 

aABeéC, strlen=5, bytesleni=7, byteslen2=5 
AABeC, strlen=5, bytesleni=7, byteslen2=5 
AABeC, strlen=5, bytesleni=7, byteslen2=5 


Since most programmers are likely to fall back on the standard UTF-8 encoding, PI 
defer to Python’s standard manual set for more details on this option and other ad- 
vanced Unicode support topics, such as properties and character name escapes in 
strings. 


Using 3.0 Bytes Objects 


We studied a wide variety of operations available for Python 3.0’s general str string 
type in Chapter 7; the basic string type works identically in 2.6 and 3.0, so we won’t 
rehash this topic. Instead, let’s dig a bit deeper into the operation sets provided by the 
new bytes type in 3.0. 


As mentioned previously, the 3.0 bytes object is a sequence of small integers, each of 
which is in the range 0 through 255, that happens to print as ASCII characters when 
displayed. It supports sequence operations and most of the same methods available on 
str objects (and present in 2.X’s str type). However, bytes does not support the for 
mat method or the % formatting expression, and you cannot mix and match bytes and 
str type objects without explicit conversions—you generally will use all str type objects 
and text files for text data, and all bytes type objects and binary files for binary data. 


Method Calls 


If you really want to see what attributes str has that bytes doesn’t, you can always 
check their dir built-in function results. The output can also tell you something about 
the expression operators they support (e.g., __mod__ and __rmod__ implement the % 
operator): 


C:\misc> c:\python30\python 
# Attributes unique to str 
>>> set(dir('abc')) - set(dir(b'abc')) 


{'isprintable', 'format', '_mod_', ‘encode’, ‘isidentifier', 


'_formatter_field_name_split', 'isnumeric', '_ rmod__', ‘isdecimal', 
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"_formatter_parser', 'maketrans'} 
# Attributes unique to bytes 


>>> set(dir(b'abc')) - set(dir('abc')) 

{'decode', 'fromhex'} 
As you can see, str and bytes have almost identical functionality. Their unique at- 
tributes are generally methods that don’t apply to the other; for instance, decode trans- 
lates a raw bytes into its str representation, and encode translates a string into its raw 
bytes representation. Most of the methods are the same, though bytes methods require 
bytes arguments (again, 3.0 string types don’t mix). Also recall that bytes objects are 
immutable, just like str objects in both 2.6 and 3.0 (error messages here have been 
shortened for brevity): 


>>> B = b'spam' # b'...' bytes literal 

>>> B.find(b'pa') 

1 

>>> B.replace(b'pa', b'XY') # bytes methods expect bytes arguments 
b'sXYm" 


>>> B.split(b'pa') 
[b's', b'm'] 

>>> B 

b'spam' 


>>> B[O] = 'x' 
TypeError: 'bytes' object does not support item assignment 


One notable difference is that string formatting works only on str objects in 3.0, not 
on bytes objects (see Chapter 7 for more on string formatting expressions and 
methods): 


>>> b'%s' % 99 
TypeError: unsupported operand type(s) for %: 'bytes' and 'int' 


>>> '%s' % 99 

"99! 

>>> b'{0}'. format (99) 

AttributeError: 'bytes' object has no attribute 'format' 


>>> '{o}'. format (99) 
99 


Sequence Operations 


Besides method calls, all the usual generic sequence operations you know (and possibly 
love) from Python 2.X strings and lists work as expected on both str and bytes in 3.0; 
this includes indexing, slicing, concatenation, and so on. Notice in the following that 
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indexing a bytes object returns an integer giving the byte’s binary value; bytes really is 
a sequence of 8-bit integers, but it prints as a string of ASCII-coded characters when 
displayed as a whole for convenience. To check a given byte’s value, use the chr built- 
in to convert it back to its character, as in the following: 


>>> B = b'spam' # A sequence of small ints 

>>> B # Prints as ASCII characters 
b'spam' 

>>> B[o] # Indexing yields an int 

115 

>>> B[-1] 

109 

>>> chr(B[0]) # Show character for int 

"6! 

>>> list(B) # Show all the byte's int values 


[115, 112, 97, 109] 


>>> B[a:], B[:-1] 
(b'pam', b'spa') 
>>> len(B) 

4 


>>> B + b'lmn' 
b'spamlmn' 

>> B*4 
b'spamspamspamspam' 


Other Ways to Make bytes Objects 


So far, we’ve been mostly making bytes objects with the b'..." literal syntax; they can 
also be created by calling the bytes constructor witha str and an encoding name, calling 
the bytes constructor with an iterable of integers representing byte values, or encoding 
a str object per the default (or passed-in) encoding. As we’ve seen, encoding takes a 
str and returns the raw binary byte values of the string according to the encoding 
specification; conversely, decoding takes a raw bytes sequence and encodes it to its 
string representation—a series of possibly wide characters. Both operations create new 
string objects: 

>>> B = b'abc' 

>>> B 

b'abc' 

>>> B = bytes('abc', 'ascii') 

>>> B 

b'abc' 

>>> ord('a') 


97 
>>> B = bytes([97, 98, 99]) 


Using 3.0 Bytes Objects | 915 


>> B 
b'abc' 


>>> B = 'spam'.encode() # Or bytes() 

>>> B 

b'spam' 

>>> 

>>> S = B.decode() # Or str() 

>>> S 

"spam' 
From a larger perspective, the last two of these operations are really tools for convert- 
ing between str and bytes, a topic introduced earlier and expanded upon in the next 
section. 


Mixing String Types 


In the replace call of the section “Method Calls” on page 913, we had to pass in two 
bytes objects—str types won’t work there. Although Python 2.X automatically con- 
verts str to and from unicode when possible (i.e., when the str is 7-bit ASCII text), 
Python 3.0 requires specific string types in some contexts and expects manual conver- 
sions if needed: 


# Must pass expected types to function and method calls 
>>> B = b'spam' 


>>> B.replace('pa', 'XY') 
TypeError: expected an object with the buffer interface 


>>> B.replace(b'pa', b'XY') 
b'sXYm' 


>>> B = B'spam' 

>>> B.replace(bytes('pa'), bytes('xy')) 

TypeError: string argument without an encoding 

>>> B.replace(bytes('pa', ‘ascii'), bytes('xy', ‘utf-8')) 
b'sxym' 

# Must convert manually in mixed-type expressions 


>>> b'ab' + 'cd' 
TypeError: can't concat bytes to str 


>>> b'ab'.decode() + 'cd' # bytes to str 
'abcd' 
>>> b'ab' + 'cd'.encode() # str to bytes 
b'abcd' 
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>>> b'ab' + bytes(‘'cd', ‘ascii') # str to bytes 

b'abcd' 
Although you can create bytes objects yourself to represent packed binary data, they 
can also be made automatically by reading files opened in binary mode, as we’ll see in 
more detail later in this chapter. First, though, we should introduce bytes’s very close, 
and mutable, cousin. 


Using 3.0 (and 2.6) bytearray Objects 


So far we’ve focused on str and bytes, since they subsume Python 2’s unicode and 
str. Python 3.0 has a third string type, though—bytearray, a mutable sequence of 
integers in the range 0 through 255, is essentially a mutable variant of bytes. As such, 
it supports the same string methods and sequence operations as bytes, as well as many 
of the mutable in-place-change operations supported by lists. The bytearray type is 
also available in Python 2.6 as a back-port from 3.0, but it does not enforce the strict 
text/binary distinction there that it does in 3.0. 


Let’s take a quick tour. bytearray objects may be created by calling the bytearray built- 
in. In Python 2.6, any string may be used to initialize: 


# Creation in 2.6: a mutable sequence of small (0..255) ints 


>>> S = 'spam' 
>>> C = bytearray(S) # A back-port from 3.0 in 2.6 
>>> C # b'..' == '..' in 2.6 (str) 


bytearray(b'spam' ) 


In Python 3.0, an encoding name or byte string is required, because text and binary 
strings do not mix, though byte strings may reflect encoded Unicode text: 


# Creation in 3.0: text/binary do not mix 


>>> S = 'spam' 
>>> C = bytearray(S) 
TypeError: string argument without an encoding 


>>> C = bytearray(S, ‘latin1') # A content-specific type in 3.0 
>> C 
bytearray(b'spam' ) 


>>> B = b'spam' # b'..' I= '..'in 3.0 (bytes/str) 
>>> C = bytearray(B) 
>> C 


bytearray(b'spam' ) 


Once created, bytearray objects are sequences of small integers like bytes and are mu- 
table like lists, though they require an integer for index assignments, not a string (all 
of the following is a continuation of this session and is run under Python 3.0 unless 
otherwise noted—see comments for 2.6 usage notes): 
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# Mutable, but must assign ints, not strings 


>>> C[o] 
115 


>>> C[o] = 'x' # This and the next work in 2.6 
TypeError: an integer is required 


>>> C[o] = b'x' 
TypeError: an integer is required 


>>> C[0] = ord('x') 
>> C 
bytearray(b'xpam' ) 


>>> C[1] = b'Y'[o] 
>> C 
bytearray(b'xYam' ) 


Processing bytearray objects borrows from both strings and lists, since they are mutable 
byte strings. Besides named methods, the _iadd_ and _setitem_ methods in 
bytearray implement += in-place concatenation and index assignment, respectively: 


# Methods overlap with both str and bytes, but also has list's mutable methods 


>>> set(dir(b'abc')) - set(dir(bytearray(b'abc'))) 
{'__getnewargs __'} 


>>> set(dir(bytearray(b'abc'))) - set(dir(b'abc')) 
{'insert', '_ alloc_', ‘reverse’, ‘extend', '_ delitem_', ‘pop’, '__setitem __ 
» '_İadd_', 'remove', ‘append’, '__imul_'} 


You can change a bytearray in-place with both index assignment, as you’ve just seen, 
and list-like methods like those shown here (to change text in-place in 2.6, you would 
need to convert to and then from a list, with list(str) and ''.join(list)): 


# Mutable method calls 


>> C 
bytearray(b'xYam' ) 


>>> C.append(b'LMN' ) # 2.6 requires string of size 1 
TypeError: an integer is required 


>>> C.append(ord('L')) 
>> C 
bytearray(b'xYamL' ) 


>>> C.extend(b'MNO' ) 
>> C 
bytearray(b'xYamLMNO' ) 
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All the usual sequence operations and string methods work on bytearrays, as you would 
expect (notice that like bytes objects, their expressions and methods expect bytes ar- 
guments, not str arguments): 


# Sequence operations and string methods 


>>> C + b'!#' 
bytearray(b'xYamLMNO! #' ) 


>>> C[o] 
120 


>>> C[1:] 
bytearray(b'YamLMNO' ) 


>>> len(C) 
8 


>> C 
bytearray(b'xYamLMNO' ) 


>>> C.replace('xY', 'sp') # This works in 2.6 
TypeError: Type str doesn't support the buffer API 


>>> C.replace(b'xY', b'sp') 
bytearray(b'spamLMNO' ) 


>> C 
bytearray(b'xYamLMNO' ) 


>> CG 
bytearray(b' xYamLMNOxYamLMNOxYamLMNOxYamLMNO' ) 


Finally, by way of summary, the following examples demonstrate how bytes and 
bytearray objects are sequences of ints, and str objects are sequences of characters: 


# Binary versus text 


>>> B # B is same as S in 2.6 
b' spam" 

>>> list(B) 

[115, 112, 97, 109] 


>> C 

bytearray(b'xYamLMNO' ) 

>>> list(C) 

[120, 89, 97, 109, 76, 77, 78, 79] 


>> S 
"spam' 
>>> list(S) 


['s', 'p', 'a', 'm'] 
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Although all three Python 3.0 string types can contain character values and support 
many of the same operations, again, you should always: 


e Use str for textual data. 
e Use bytes for binary data. 


e Use bytearray for binary data you wish to change in-place. 


Related tools such as files, the next section’s topic, often make the choice for you. 


Using Text and Binary Files 


This section expands on the impact of Python 3.0’s string model on the file processing 
basics introduced earlier in the book. As mentioned earlier, the mode in which you 
open a file is crucial—it determines which object type you will use to represent the file’s 
content in your script. Text mode implies str objects, and binary mode implies bytes 
objects: 


e Text-mode files interpret file contents according to a Unicode encoding—either the 
default for your platform, or one whose name you pass in. By passing in an encoding 
name to open, you can force conversions for various types of Unicode files. Text- 
mode files also perform universal line-end translations: by default, all line-end 
forms map to the single '\n' character in your script, regardless of the platform on 
which you run it. As described earlier, text files also handle reading and writing 
the byte order mark (BOM) stored at the start-of-file in some Unicode encoding 
schemes. 


e Binary-mode files instead return file content to you raw, as a sequence of integers 
representing byte values, with no encoding or decoding and no line-end 
translations. 


The second argument to open determines whether you want text or binary processing, 
just as it does in 2.X Python—adding a “b” to this string implies binary mode (e.g., 
"rb" to read binary data files). The default mode is "rt"; this is the same as "r", which 
means text input (just as in 2.X). 


In 3.0, though, this mode argument to open also implies an object type for file content 
representation, regardless of the underlying platform—text files return a str for reads 
and expect one for writes, but binary files return a bytes for reads and expect one (or 
a bytearray) for writes. 


Text File Basics 


To demonstrate, let’s begin with basic file I/O. As long as you’re processing basic text 
files (e.g., ASCII) and don’t care about circumventing the platform-default encoding of 
strings, files in 3.0 look and feel much as they do in 2.X (for that matter, so do strings 
in general). The following, for instance, writes one line of text toa file and reads it back 
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in 3.0, exactly as it would in 2.6 (note that file is no longer a built-in name in 3.0, so 
it’s perfectly OK to use it as a variable here): 


C:\misc> c:\python30\python 


# Basic text files (and strings) work the same as in 2.X 


>>> file = open('temp', 'w') 

>>> size = file.write('‘abc\n') # Returns number of bytes written 

>>> file.close() # Manual close to flush output buffer 

>>> file = open('temp') # Default mode is "r" (== "rt"): text input 
>>> text = file.read() 

>>> text 

'abc\n' 

>>> print(text) 

abc 


Text and Binary Modes in 3.0 


In Python 2.6, there is no major distinction between text and binary files—both accept 
and return content as str strings. The only major difference is that text files automat- 
ically map \n end-of-line characters to and from \r\n on Windows, while binary files 
do not (I’m stringing operations together into one-liners here just for brevity): 


C:\misc> c:\python26\python 


>>> open('temp', ‘w').write('abd\n') # Write in text mode: adds \r 
>>> open('temp', 'r').read() # Read in text mode: drops \r 
‘abd\n' 

>>> open('temp', 'rb').read() # Read in binary mode: verbatin 
"abd\r\n' 

>>> open('temp', ‘wb').write('‘abc\n') # Write in binary mode 

>>> open('temp', 'r').read() # \n not expanded to \r\n 
'abc\n' 

>>> open('temp', 'rb').read() 

'abc\n' 


In Python 3.0, things are bit more complex because of the distinction between str for 
text data and bytes for binary data. To demonstrate, let’s write a text file and read it 
back in both modes in 3.0. Notice that we are required to provide a str for writing, but 
reading gives us a str or a bytes, depending on the open mode: 


C:\misc> c:\python30\python 


# Write and read a text file 


>>> open('temp', ‘'w').write('abc\n') # Text mode output, provide a str 
4 

>>> open('temp', 'r').read() # Text mode input, returns a str 
"abc\n' 
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>>> open('temp', ‘rb').read() # Binary mode input, returns a bytes 
b'abc\r\n' 


Notice how on Windows text-mode files translate the \n end-of-line character to \r\n 
on output; on input, text mode translates the \r\n back to \n, but binary mode does 


not. This is the same in 2.6, and it’s what we want for binary data (no translations 
should occur), although you can control this behavior with extra open arguments in 3.0 


if desired. 


Now let’s do the same again, but with a binary file. We provide a bytes to write in this 
case, and we still get back a str or a bytes, depending on the input mode: 


# Write and read a binary file 


>>> open('temp', ‘wb').write(b'abc\n') # Binary mode output, provide a bytes 
4 

>>> open('temp', 'r').read() # Text mode input, returns a str 
‘abc\n' 

>>> open('temp', 'rb').read() # Binary mode input, returns a bytes 
b'abc\n' 


Note that the \n end-of-line character is not expanded to \r\n in binary-mode output— 
again, a desired result for binary data. Type requirements and file behavior are the same 
even if the data we’re writing to the binary file is truly binary in nature. In the following, 
for example, the "\x00" is a binary zero byte and not a printable character: 


# Write and read truly binary data 


>>> open('temp', ‘wb').write(b'a\xo0c') # Provide a bytes 
3 

>>> open('temp', 'r').read() # Receive a str 
"a\xo0c' 

>>> open('temp', ‘rb').read() # Receive a bytes 
b'a\xo0c' 


Binary-mode files always return contents as a bytes object, but accept either a bytes or 
bytearray object for writing; this naturally follows, given that bytearray is basically just 
a mutable variant of bytes. In fact, most APIs in Python 3.0 that accept a bytes also 
allow a bytearray: 


# bytearrays work too 
>>> BA = bytearray(b'\x01\x02\x03') 


>>> open('temp', ‘wb').write(BA) 
3 


>>> open('temp', 'r').read() 
"\x01\x02\x03' 
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>>> open('temp', 'rb').read() 
b'\x01\x02\x03' 


Type and Content Mismatches 


Notice that you cannot get away with violating Python’s str/bytes type distinction 
when it comes to files. As the following examples illustrate, we get errors (shortened 
here) if we try to write a bytes to a text file or a str to a binary file: 


# Types are not flexible for file content 


>>> open('temp', ‘w').write('abc\n') # Text mode makes and requires str 
4 

>>> open('temp', ‘w').write(b'abc\n') 

TypeError: can't write bytes to text stream 


>>> open('temp', ‘wb').write(b'abc\n') # Binary mode makes and requires bytes 

4 

>>> open('temp', ‘wb').write('abc\n') 

TypeError: can't write str to binary stream 
This makes sense: text has no meaning in binary terms, before it is encoded. Although 
it is often possible to convert between the types by encoding str and decoding bytes, 
as described earlier in this chapter, you will usually want to stick to either str for text 
data or bytes for binary data. Because the str and bytes operation sets largely intersect, 
the choice won’t be much of a dilemma for most programs (see the string tools coverage 
in the final section of this chapter for some prime examples of this). 


In addition to type constraints, file content can matter in 3.0. Text-mode output files 
require a str instead of a bytes for content, so there is no way in 3.0 to write truly binary 
data to a text-mode file. Depending on the encoding rules, bytes outside the default 
character set can sometimes be embedded in a normal string, and they can always be 
written in binary mode. However, because text-mode input files in 3.0 must be able to 
decode content per a Unicode encoding, there is no way to read truly binary data in 
text mode: 


# Can't read truly binary data in text mode 


>>> chr (OxFF) # FF is a valid char, FE is not 


>>> chr(OxFE) 
UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 1... 


>>> open('temp', ‘w').write(b'\xFF\xFE\xFD' ) # Can't use arbitrary bytes! 
TypeError: can't write bytes to text stream 


>>> open('temp', ‘w').write('\xFF\xFE\xFD' ) # Can write if embeddable in str 


3 

>>> open('temp', 'wb').write(b'\xFF\xFE\xFD') # Can also write in binary mode 

3 

>>> open('temp', ‘rb').read() # Can always read as binary bytes 
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b'\xff\xfe\xfd' 


>>> open('temp', 'r').read() # Can't read text unless decodable! 
UnicodeEncodeError: 'charmap' codec can't encode characters in position 2-3: ... 


This last error stems from the fact that all text files in 3.0 are really Unicode text files, 
as the next section describes. 


Using Unicode Files 


So far, we’ve been reading and writing basic text and binary files, but what about pro- 
cessing Unicode files? It turns out to be easy to read and write Unicode text stored in 
files, because the 3.0 open call accepts an encoding for text files, which does the en- 
coding and decoding for us automatically as data is transferred. This allows us to 
process Unicode text created with different encodings than the default for the platform, 
and store in different encodings to convert. 


Reading and Writing Unicode in 3.0 


In fact, we can convert a string to different encodings both manually with method calls 
and automatically on file input and output. We’ll use the following Unicode string in 
this section to demonstrate: 


C:\misc> c:\python30\python 


>>> S = 'A\xc4B\xe8C' # 5-character string, non-ASCII 
>> S 

"AABEC' 

>>> len(S) 

5 


Manual encoding 


As we’ve already learned, we can always encode such a string to raw bytes according 
to the target encoding name: 


# Encode manually with methods 


>>> L = S.encode('latin-1') # 5 bytes when encoded as latin-1 
>>> L 

b'A\xc4B\xe8C' 

>>> len(L) 

5 

>>> U = S.encode('utf-8') # 7 bytes when encoded as utf-8 
>>> U 

b'A\xc3\x84B\xc3\xa8C' 

>>> len(U) 

7 
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File output encoding 


Now, to write our string to a text file in a particular encoding, we can simply pass the 
desired encoding name to open—although we could manually encode first and write in 
binary mode, there’s no need to: 


# Encoding automatically when written 


>>> open('latindata'’, 'w', encoding='latin-1').write(S) # Write as latin-1 


5 

>>> open('utf8data', 'w', encoding='utf-8') .write(S) # Write as utf-8 
5 

>>> open('latindata’, 'rb').read() # Read raw bytes 


b'A\xc4B\xe8C' 


>>> open('utf8data', 'rb').read() # Different in files 
b'A\xc3\x84B\xc3\xa8C ' 


File input decoding 


Similarly, to read arbitrary Unicode data, we simply pass in the file’s encoding type 
name to open, and it decodes from raw bytes to strings automatically; we could read 
raw bytes and decode manually too, but that can be tricky when reading in blocks (we 
might read an incomplete character), and it isn’t necessary: 


# Decoding automatically when read 


>>> open('latindata'’, 'r', encoding='latin-1').read() # Decoded on input 
' AÄBÈèC' 

>>> open('utf8data', 'r', encoding='utf-8').read() # Per encoding type 
' AÄBÈèC' 

>>> X = open('latindata', 'rb').read() # Manual decoding: 
>>> X.decode('latin-1') # Not necessary 

' AÄBÈèC' 

>>> X = open('utf8data', 'rb').read() 

>>> X.decode() # UTF-8 is default 

' AÄBÈèC' 


Decoding mismatches 


Finally, keep in mind that this behavior of files in 3.0 limits the kind of content you can 
load as text. As suggested in the prior section, Python 3.0 really must be able to decode 
the data in text files into a str string, according to either the default or a passed-in 
Unicode encoding name. Trying to open a truly binary data file in text mode, for ex- 
ample, is unlikely to work in 3.0 even if you use the correct object types: 

>>> file = open('python.exe', 'r') 

>>> text = file.read() 

UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 2: ... 


>>> file = open('python.exe', 'rb') 
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>>> data = file.read() 
>>> data[:20] 
b'MZ\x90\x00\x03\x00\x00\x00\x04\x00\x00\x00\xf\xfF\x00\x00\xb8\x00\x00\x00' 


The first of these examples might not fail in Python 2.X (normal files do not decode 
text), even though it probably should: reading the file may return corrupted data in the 
string, due to automatic end-of-line translations in text mode (any embedded \r\n bytes 
will be translated to \n on Windows when read). To treat file content as Unicode text 
in 2.6, we need to use special tools instead of the general open built-in function, as we'll 
see in a moment. First, though, let’s turn to a more explosive topic... 


Handling the BOM in 3.0 


As described earlier in this chapter, some encoding schemes store a special byte order 
marker (BOM) sequence at the start of files, to specify data endianness or declare the 
encoding type. Python both skips this marker on input and writes it on output if the 
encoding name implies it, but we sometimes must use a specific encoding name to force 
BOM processing explicitly. 


For example, when you save a text file in Windows Notepad, you can specify its en- 
coding type in a drop-down list—simple ASCII text, UTF-8, or little- or big-endian 
UTF-16. Ifa one-line text file named spam.txt is saved in Notepad as the encoding type 
“ANSI,” for instance, it’s written as simple ASCII text without a BOM. When this file 
is read in binary mode in Python, we can see the actual bytes stored in the file. When 
it’s read as text, Python performs end-of-line translation by default; we can decode it 
as explicit UTF-8 text since ASCII is a subset of this scheme (and UTF-8 is Python 3.0’s 
default encoding): 


c:\misc> C:\Python30\python # File saved in Notepad 

>>> import sys 

>>> sys.getdefaultencoding() 

'utf-8' 

>>> open('spam.txt', 'rb').read() # ASCII (UTF-8) text file 
b'spam\r\nSPAM\r\n' 

>>> open('spam.txt', 'r').read() # Text mode translates line-end 
"spam\nSPAM\n' 

>>> open('spam.txt', 'r', encoding='utf-8').read() 

"spam\nSPAM\n' 


If this file is instead saved as “UTF-8” in Notepad, it is prepended with a three-byte 
UTF-8 BOM sequence, and we need to give a more specific encoding name 
(“utf-8-sig”) to force Python to skip the marker: 


>>> open('spam.txt', 'rb').read() # UTF-8 with 3-byte BOM 
b'\xef\xbb\xbfspam\r\nSPAM\r\n' 

>>> open('spam.txt', 'r').read() 

‘in gspam\nSPAM\n' 

>>> open('spam.txt', 'r', encoding='utf-8').read() 
'\ufeffspam\nSPAM\n' 

>>> open('spam.txt', 'r', encoding='utf-8-sig').read() 
‘spam\nSPAM\n' 


926 | Chapter 36: Unicode and Byte Strings 


If the file is stored as “Unicode big endian” in Notepad, we get UTF-16-format data in 
the file, prepended with a two-byte BOM sequence—the encoding name “utf-16” in 
Python skips the BOM because it is implied (since all UTF-16 files have a BOM), and 
“utf-16-be” handles the big-endian format but does not skip the BOM: 


>>> open('spam.txt', 'rb').read() 
b'\xfe\xfF\x00s\x00p\x00a\x00m\x00\r\x00\n\x00S\x00P\x00A\x00M\x00\r\x00\n' 

>>> open('spam.txt', 'r').read() 

UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 1:... 
>>> open('spam.txt', 'r', encoding='utf-16').read() 

‘spam\nSPAM\n' 

>>> open('spam.txt', 'r', encoding='utf-16-be').read() 

'\ufeffspam\nSPAM\n' 


The same is generally true for output. When writing a Unicode file in Python code, we 
need a more explicit encoding name to force the BOM in UTF-8—“utf-8” does not 
write (or skip) the BOM, but “utf-8-sig” does: 


>>> open('temp.txt', 'w', encoding='utf-8') .write('spam\nSPAM\n' ) 
10 

>>> open('temp.txt', 'rb').read() # No BOM 
b'spam\r\nSPAM\r\n' 


>>> open('temp.txt', 'w', encoding='utf-8-sig').write('spam\nSPAM\n' ) 
10 

>>> open('temp.txt', 'rb').read() # Wrote BOM 
b'\xef\xbb\xbfspam\r\nSPAM\r\n' 


>>> open('temp.txt', 'r').read() 
‘in gspam\nSPAM\n' 


>>> open('temp.txt', 'r', encoding='utf-8').read() # Keeps BOM 
'\ufeffspam\nSPAM\n' 

>>> open('temp.txt', 'r', encoding='utf-8-sig').read() # Skips BOM 
"spam\nSPAM\n' 


Notice that although “utf-8” does not drop the BOM, data without a BOM can be read 
with both “utf-8” and “utf-8-sig”—use the latter for input if you’re not sure whether a 
BOM is present in a file (and don’t read this paragraph out loud in an airport security 
line!): 


>>> open('temp.txt', 'w').write('spam\nSPAM\n' ) 


10 

>>> open('temp.txt', 'rb').read() # Data without BOM 
b'spam\r\nSPAM\r\n' 

>>> open('temp.txt', 'r').read() # Any utf-8 works 
‘spam\nSPAM\n' 

>>> open('temp.txt', 'r', encoding='utf-8').read() 

"spam\nSPAM\n' 

>>> open('temp.txt', 'r', encoding='utf-8-sig').read() 

"spam\nSPAM\n' 


Finally, for the encoding name “utf-16,” the BOM is handled automatically: on out- 
put, data is written in the platform’s native endianness, and the BOM is always written; 
on input, data is decoded per the BOM, and the BOM is always stripped. More specific 
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UTF-16 encoding names can specify different endianness, though you may have to 
manually write and skip the BOM yourself in some scenarios if it is required or present: 


>>> sys. byteorder 

'little' 

>>> open('temp.txt', 'w', encoding='utf-16').write('spam\nSPAM\n') 

10 

>>> open('temp.txt', 'rb').read() 
b'\xf\xfes\xo0p\x00a\x00m\x00\r\x00\n\x00S\x00P\x00A\x00M\x00\r\x00\n\x00' 
>>> open('temp.txt', 'r', encoding='utf-16').read() 

"spam\nSPAM\n' 


>>> open('temp.txt', 'w', encoding='utf-16-be').write('\ufeffspam\nSPAM\n' ) 
11 

>>> open('spam.txt', 'rb').read() 
b'\xfe\xfF\x00s\x00p\x00a\x00m\x00\r\x00\n\x00S\x00P\x00A\x00M\x00\r\x00\n' 
>>> open('temp.txt', 'r', encoding='utf-16').read() 

‘spam\nSPAM\n' 

>>> open('temp.txt', 'r', encoding='utf-16-be').read() 

'\ufeffspam\nSPAM\n' 


The more specific UTF-16 encoding names work fine with BOM-less files, though 
“atf-16” requires one on input in order to determine byte order: 


>>> open('temp.txt', 'w', encoding='utf-16-le' ) .write('SPAM' ) 

4 

>>> open('temp.txt', 'rb').read() # OK if BOM not present or expected 
b'S\x00P\x00A\x00M\x00' 

>>> open('temp.txt', 'r', encoding='utf-16-le').read() 

"SPAM' 

>>> open('temp.txt', 'r', encoding='utf-16').read() 

UnicodeError: UTF-16 stream does not start with BOM 


Experiment with these encodings yourself or see Python’s library manuals for more 
details on the BOM. 


Unicode Files in 2.6 


The preceding discussion applies to Python 3.0’s string types and files. You can achieve 
similar effects for Unicode files in 2.6, but the interface is different. If you replace str 
with unicode and open with codecs .open, the result is essentially the same in 2.6: 


C:\misc> c:\python26\python 
>>> S = u'A\xc4B\xe8C' 

>>> print S 

AAB&C 

>>> len(S) 

5 

>>> S.encode('latin-1') 
"A\xc4B\xe8C' 

>>> S.encode('utf-8') 
"A\xc3\x84B\xc3\xa8C' 


>>> import codecs 
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>>> codecs.open('latindata', 'w', encoding='latin-1').write(S) 
>>> codecs.open('utfdata', 'w', encoding='utf-8').write(S) 


>>> open('latindata’, 'rb').read() 
"A\xc4B\xe8C' 

>>> open('utfdata', 'rb').read() 
"A\xc3\x84B\xc3\xa8C' 


>>> codecs.open('latindata', 'r', encoding='latin-1').read() 
u'A\xc4B\xe8C' 

>>> codecs.open('utfdata', 'r', encoding='utf-8').read() 
u'A\xc4B\xe8C' 


Other String Tool Changes in 3.0 


Some of the other popular string-processing tools in Python’s standard library have 
been revamped for the new str/bytes type dichotomy too. We won’t cover any of these 
application-focused tools in much detail in this core language book, but to wrap up 
this chapter, here’s a quick look at four of the major tools impacted: the re pattern- 
matching module, the struct binary data module, the pickle object serialization mod- 
ule, and the xml package for parsing XML text. 


The re Pattern Matching Module 


Python’s re pattern-matching module supports text processing that is more general 
than that afforded by simple string method calls such as find, split, and replace. With 
re, strings that designate searching and splitting targets can be described by general 
patterns, instead of absolute text. This module has been generalized to work on objects 
of any string type in 3.0—str, bytes, and bytearray—and returns result substrings of 
the same type as the subject string. 


Here it is at work in 3.0, extracting substrings from a line of text. Within pattern strings, 
(.*) means any character (.), zero or more times (*), saved away as a matched substring 
(()). Parts of the string matched by the parts of a pattern enclosed in parentheses are 
available after a successful match, via the group or groups method: 


C:\misc> c:\python30\python 
>>> import re 


>>> S = "Bugger all down here on earth! ' # Line of text 

>>> B = b'Bugger all down here on earth!' # Usually from a file 
>>> re.match('(.*) down (.*) on (.*)', $).groups() # Match line to pattern 
(‘Bugger all', 'here', ‘earth!') # Matched substrings 


>>> re.match(b'(.*) down (.*) on (.*)', B).groups() # bytes substrings 
(b'Bugger all', b'here', b'earth!') 


In Python 2.6 results are similar, but the unicode type is used for non-ASCII text, and 
str handles both 8-bit and binary text: 
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C:\misc> c:\python26\python 

>>> import re 

>>> S = ‘Bugger all down here on earth!' # Simple text and binary 
>>> U = u'Bugger all down here on earth!' # Unicode text 


>>> re.match('(.*) down (.*) on (.*)', $).groups() 
(‘Bugger all', 'here', ‘earth!') 


>>> re.match('(.*) down (.*) on (.*)', U).groups() 
(u'Bugger all', u'here', u'earth!') 


Since bytes and str support essentially the same operation sets, this type distinction is 
largely transparent. But note that, like in other APIs, you can’t mix str and bytes types 
in its calls’ arguments in 3.0 (although if you don’t plan to do pattern matching on 
binary data, you probably don’t need to care): 

C:\misc> c:\python30\python 

>>> import re 


>>> S = ‘Bugger all down here on earth!" 
>>> B = b'Bugger all down here on earth!' 


>>> re.match('(.*) down (.*) on (.*)', B).groups() 
TypeError: can't use a string pattern on a bytes-like object 


>>> re.match(b'(.*) down (.*) on (.*)', S).groups() 
TypeError: can't use a bytes pattern on a string-like object 


>>> re.match(b'(.*) down (.*) on (.*)', bytearray(B)).groups() 
(bytearray(b'Bugger all'), bytearray(b'here'), bytearray(b'earth!')) 


>>> re.match('(.*) down (.*) on (.*)', bytearray(B)).groups() 
TypeError: can't use a string pattern on a bytes-like object 


The struct Binary Data Module 


The Python struct module, used to create and extract packed binary data from strings, 
also works the same in 3.0 as it does in 2.X, but packed data is represented as bytes 
and bytearray objects only, not str objects (which makes sense, given that it’s intended 
for processing binary data, not arbitrarily encoded text). 


Here are both Pythons in action, packing three objects into a string according to a binary 
type specification (they create a four-byte integer, a four-byte string, and a two-byte 
integer): 


C:\misc> c:\python30\python 

>>> from struct import pack 

>>> pack('>i4sh', 7, ‘spam’, 8) # bytes in 3.0 (8-bit string) 
b'\x00\x00\x00\x07spam\x00\x08' 


C:\misc> c:\python26\python 

>>> from struct import pack 

>>> pack('>i4sh', 7, ‘spam’, 8) # str in 2.6 (8-bit string) 
"\x00\x00\x00\x07spam\x00\x08' 
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Since bytes has an almost identical interface to that of str in 3.0 and 2.6, though, most 
programmers probably won’t need to care—the change is irrelevant to most existing 
code, especially since reading from a binary file creates a bytes automatically. Although 
the last test in the following example fails on a type mismatch, most scripts will read 
binary data from a file, not create it as a string: 

C:\misc> c:\python30\python 

>>> import struct 

>>> B = struct.pack('>i4sh', 7, 'spam', 8) 

>>> B 

b'\x00\x00\x00\x07spam\x00\x08' 


>>> vals = struct.unpack('>i4sh', B) 
>>> vals 
(7, b'spam', 8) 


>>> vals = struct.unpack('>i4sh', B.decode()) 

TypeError: 'str' does not have the buffer interface 
Apart from the new syntax for bytes, creating and reading binary files works almost the 
same in 3.0 as it does in 2.X. Code like this is one of the main places where programmers 
will notice the bytes object type: 


C:\misc> c:\python30\python 
# Write values to a packed binary file 
>>> F = open('data.bin', 'wb') # Open binary output file 


>>> import struct 
>>> data = struct.pack('>i4sh', 7, ‘spam’, 8) | # Create packed binary data 


>>> data # bytes in 3.0, not str 
b'\x00\x00\x00\x07spam\x00\x08' 

>>> F.write(data) # Write to the file 

10 


>>> F.close() 


# Read values from a packed binary file 


>>> F = open('data.bin', 'rb') # Open binary input file 

>>> data = F.read() # Read bytes 

>>> data 

b'\x00\x00\x00\x07spam\x00\x08' 

>>> values = struct.unpack('>i4sh', data) # Extract packed binary data 
>>> values # Back to Python objects 


(7, b'spam', 8) 


Once you’ve extracted packed binary data into Python objects like this, you can dig 
even further into the binary world if you have to—strings can be indexed and sliced to 
get individual bytes’ values, individual bits can be extracted from integers with bitwise 
operators, and so on (see earlier in this book for more on the operations applied here): 


>>> values # Result of struct.unpack 
(7, b'spam', 8) 
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# Accesssing bits of parsed integers 


>>> bin(values[0]) # Can get to bits in ints 
"0b111' 

>>> values[0] & 0x01 # Test first (lowest) bit in int 
a 

>>> values[0] | 0b1010 # Bitwise or: turn bits on 

15 

>>> bin(values[0] | 0b1010) # 15 decimal is 1111 binary 
"0b1111' 

>>> bin(values[0] ^ 0b1010) # Bitwise xor: off if both true 
"0b1101' 

>>> bool(values[0] & 0b100) # Test if bit 3 is on 

True 

>>> bool(values[0] & 0b1000) # Test if bit 4 is set 

False 


Since parsed bytes strings are sequences of small integers, we can do similar processing 
with their individual bytes: 


# Accessing bytes of parsed strings and bits within them 


>>> values[1] 

b'spam' 

>>> values[1] [0] # bytes string: sequence of ints 
115 

>>> values[1][1:] # Prints as ASCII characters 
b'pam' 

>>> bin(values[1][0]) # Can get to bits of bytes in strings 
"0b1110011' 

>>> bin(values[1][0] | 0b1100) # Turn bits on 

"0b1111111' 

>>> values[1][0] | 0b1100 

127 


Of course, most Python programmers don’t deal with binary bits; Python has higher- 
level object types, like lists and dictionaries, that are generally a better choice for 
representing information in Python scripts. However, if you must use or produce 
lower-level data used by C programs, networking libraries, or other interfaces, Python 
has tools to assist. 


The pickle Object Serialization Module 


We met the pickle module briefly in Chapters 9 and 30. In Chapter 27, we also used 
the shelve module, which uses pickle internally. For completeness here, keep in mind 
that the Python 3.0 version of the pickle module always creates a bytes object, regard- 
less of the default or passed-in “protocol” (data format level). You can see this by using 
the module’s dumps call to return an object’s pickle string: 


C:\misc> C:\Python30\python 
>>> import pickle # dumps() returns pickle string 


>>> pickle.dumps([1, 2, 3]) # Python 3.0 default protocol=3=binary 
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b'\x80\x03 ]q\x00(K\x01K\x02K\x03e. ' 


>>> pickle.dumps([1, 2, 3], protocol=0) # ASCII protocol 0, but still bytes! 
b' (1po\nL1L\naL2L\naL3L\na. ' 


This implies that files used to store pickled objects must always be opened in binary 
mode in Python 3.0, since text files use str strings to represent data, not bytes—the 
dump call simply attempts to write the pickle string to an open output file: 


>>> pickle.dump([1, 2, 3], open('temp', 'w')) # Text files fail on bytes! 
TypeError: can't write bytes to text stream # Despite protocol value 


>>> pickle.dump([1, 2, 3], open('temp', 'w'), protocol=0) 
TypeError: can't write bytes to text stream 


>>> pickle.dump([1, 2, 3], open('temp', 'wb')) # Always use binary in 3.0 


>>> open('temp', 'r').read() 
UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in ... 


Because pickle data is not decodable Unicode text, the same is true on input—correct 
usage in 3.0 requires always writing and reading pickle data in binary modes: 


>>> pickle.dump([1, 2, 3], open('temp', 'wb')) 
>>> pickle.load(open('temp', 'rb')) 

[1, 2, 3] 

>>> open('temp', 'rb').read() 

b'\x80\x03 ]q\x00(K\x01K\x02K\x03e. ' 


In Python 2.6 (and earlier), we can get by with text-mode files for pickled data, as long 
as the protocol is level 0 (the default in 2.6) and we use text mode consistently to convert 
line-ends: 


C:\misc> c:\python26\python 

>>> import pickle 

>>> pickle.dumps([1, 2, 3]) # Python 2.6 default=0=ASCII 
*(1po\nT1\naI2\naI3\na. ' 


>>> pickle.dumps([1, 2, 3], protocol=1) 

"]q\x00(K\x01K\x02K\x03e. ' 

>>> pickle.dump([1, 2, 3], open('temp', 'w')) # Text mode works in 2.6 
>>> pickle.load(open( ‘temp’ )) 

[1, 2, 3] 

>>> open('temp').read() 

*(1po\nT1\naI2\naI3\na. ' 


v 


If you care about version neutrality, though, or don’t want to care about protocols or 
their version-specific defaults, always use binary-mode files for pickled data—the fol- 
lowing works the same in Python 3.0 and 2.6: 


>>> import pickle 


>>> pickle.dump([1, 2, 3], open('temp', 'wb')) # Version neutral 
>>> pickle.load(open('temp', 'rb')) # And required in 3.0 
[1, 2, 3] 
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Because almost all programs let Python pickle and unpickle objects automatically and 
do not deal with the content of pickled data itself, the requirement to always use binary 
file modes is the only significant incompatibility in Python 3’s new pickling model. See 
reference books or Python’s manuals for more details on object pickling. 


XML Parsing Tools 


XML is a tag-based language for defining structured information, commonly used to 
define documents and data shipped over the Web. Although some information can be 
extracted from XML text with basic string methods or the re pattern module, XML’s 
nesting of constructs and arbitrary attribute text tend to make full parsing more 
accurate. 


Because XML is such a pervasive format, Python itself comes with an entire package of 
XML parsing tools that support the SAX and DOM parsing models, as well as a package 
known as ElementTree—a Python-specific API for parsing and constructing XML. 
Beyond basic parsing, the open source domain provides support for additional XML 
tools, such as XPath, Xquery, XSLT, and more. 


XML by definition represents text in Unicode form, to support internationalization. 
Although most of Python’s XML parsing tools have always returned Unicode strings, 
in Python 3.0 their results have mutated from the 2.X unicode type to the 3.0 general 
str string type—which makes sense, given that 3.0’s str string is Unicode, whether the 
encoding is ASCII or other. 


We can’t go into many details here, but to sample the flavor of this domain, suppose 
we have a simple XML text file, mybooks.xml: 
<books> 
<date>2009</date> 
<title>Learning Python</title> 
<title>Programming Python</title> 
<title>Python Pocket Reference</title> 
<publisher>0'Reilly Media</publisher> 
</books> 


and we want to run a script to extract and display the content of all the nested title 
tags, as follows: 
Learning Python 


Programming Python 
Python Pocket Reference 


There are at least four basic ways to accomplish this (not counting more advanced tools 
like XPath). First, we could run basic pattern matching on the file’s text, though this 
tends to be inaccurate if the text is unpredictable. Where applicable, the re module we 
met earlier does the job—its match method looks for a match at the start of a string, 
search scans ahead for a match, and the findall method used here locates all places 
where the pattern matches in the string (the result comes back as a list of matched 
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substrings corresponding to parenthesized pattern groups, or tuples of such for mul- 
tiple groups): 
# File patternparse.py 


import re 

text = open('mybooks.xml').read() 

found = re.findall('<title>(.*)</title>', text) 
for title in found: print(title) 


Second, to be more robust, we could perform complete XML parsing with the standard 
library’s DOM parsing support. DOM parses XML text into a tree of objects and pro- 
vides an interface for navigating the tree to extract tag attributes and values; the inter- 
face is a formal specification, independent of Python: 


# File domparse.py 


from xml.dom.minidom import parse, Node 
xmltree = parse('mybooks.xml') 
for node1 in xmltree.getElementsByTagName('title'): 
for node2 in node1.childNodes: 
if node2.nodeType == Node. TEXT NODE: 
print (node2.data) 


Asa third option, Python’s standard library supports SAX parsing for XML. Under the 
SAX model, a class’s methods receive callbacks as a parse progresses and use state 
information to keep track of where they are in the document and collect its data: 


# File saxparse.py 


import xml.sax.handler 
class BookHandler(xml.sax.handler.ContentHandler) : 
def _ init__(self): 
self.inTitle = False 
def startElement(self, name, attributes): 
if name == ‘title’: 
self.inTitle = True 
def characters(self, data): 
if self.inTitle: 
print (data) 
def endElement(self, name): 
if name == ‘title’: 
self.inTitle = False 


import xml.sax 

parser = xml.sax.make_parser() 
handler = BookHandler() 
parser.setContentHandler (handler) 
parser.parse('mybooks.xml1' ) 


Finally, the ElementTree system available in the etree package of the standard library 
can often achieve the same effects as XML DOM parsers, but with less code. It’s a 
Python-specific way to both parse and generate XML text; after a parse, its API gives 
access to components of the document: 
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# File etreeparse.py 


from xml.etree.ElementTree import parse 

tree = parse('mybooks.xml') 

for E in tree.findall('title'): 
print(E.text) 


When run in either 2.6 or 3.0, all four of these scripts display the same printed result: 


C:\misc> c:\python26\python domparse.py 
Learning Python 

Programming Python 

Python Pocket Reference 


C:\misc> c:\python30\python domparse.py 
Learning Python 

Programming Python 

Python Pocket Reference 


Technically, though, in 2.6 some of these scripts produce unicode string objects, while 
in 3.0 all produce str strings, since that type includes Unicode text (whether ASCII or 
other): 

C:\misc> c:\python30\python 

>>> from xml.dom.minidom import parse, Node 

>>> xmltree = parse('mybooks.xml' ) 

>>> for node in xmltree.getElementsByTagName( ‘title’ ): 

for node2 in node.childNodes: 


if node2.nodeType == Node. TEXT_NODE: 
node2.data 


"Learning Python’ 
"Programming Python' 
"Python Pocket Reference’ 


C:\misc> c:\python26\python 
>>> ...same code... 


u'Learning Python’ 

u'Programming Python' 

u'Python Pocket Reference’ 
Programs that must deal with XML parsing results in nontrivial ways will need to ac- 
count for the different object type in 3.0. Again, though, because all strings have nearly 
identical interfaces in both 2.6 and 3.0, most scripts won’t be affected by the change; 
tools available on unicode in 2.6 are generally available on str in 3.0. 


Regrettably, going into further XML parsing details is beyond this book’s scope. If you 
are interested in text or XML parsing, it is covered in more detail in the applications- 
focused follow-up book Programming Python. For more details on re, struct, pickle, 
and XML tools in general, consult the Web, the aforementioned book and others, and 
Python’s standard library manual. 
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Chapter Summary 


This chapter explored advanced string types available in Python 3.0 and 2.6 for pro- 
cessing Unicode text and binary data. As we saw, many programmers use ASCII text 
and can get by with the basic string type and its operations. For more advanced appli- 
cations, Python’s string models fully support both wide-character Unicode text (via the 
normal string type in 3.0 and a special type in 2.6) and byte-oriented data (represented 
with a bytes type in 3.0 and normal strings in 2.6). 


In addition, we learned how Python’s file object has mutated in 3.0 to automatically 
encode and decode Unicode text and deal with byte strings for binary-mode files. Fi- 
nally, we briefly met some text and binary data tools in Python’s library, and sampled 
their behavior in 3.0. 


In the next chapter, we’ll shift our focus to tool-builder topics, with a look at ways to 
manage access to object attributes by inserting automatically run code. Before we move 
on, though, here’s a set of questions to review what we’ve learned here. 


Test Your Knowledge: Quiz 


. What are the names and roles of string object types in Python 3.0? 

. What are the names and roles of string object types in Python 2.6? 

. What is the mapping between 2.6 and 3.0 string types? 

. How do Python 3.0’s string types differ in terms of operations? 

. How can you code non-ASCII Unicode characters in a string in 3.0? 


. What are the main differences between text- and binary-mode files in Python 3.0? 
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. How would you read a Unicode text file that contains text in a different encoding 
than the default for your platform? 


8. How can you create a Unicode text file in a specific encoding format? 
9. Why is ASCII text considered to be a kind of Unicode text? 


10. How large an impact does Python 3.0’s string types change have on your code? 


Test Your Knowledge: Answers 


1. Python 3.0 has three string types: str (for Unicode text, including ASCII), bytes 
(for binary data with absolute byte values), and bytearray (a mutable flavor of 
bytes). The str type usually represents content stored on a text file, and the other 
two types generally represent content stored on binary files. 
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. Python 2.6 has two main string types: str (for 8-bit text and binary data) and 
unicode (for wide-character text). The str type is used for both text and binary file 
content; unicode is used for text file content that is generally more complex than 
8 bits. Python 2.6 (but not earlier) also has 3.0’s bytearray type, but it’s mostly a 
back-port and doesn’t exhibit the sharp text/binary distinction that it does in 3.0. 


. The mapping from 2.6 to 3.0 string types is not direct, because 2.6’s str equates 
to both str and bytes in 3.0, and 3.0’s str equates to both str and unicode in 2.6. 
The mutability of bytearray in 3.0 is also unique. 


. Python 3.0’s string types share almost all the same operations: method calls, se- 
quence operations, and even larger tools like pattern matching work the same way. 
On the other hand, only str supports string formatting operations, and 
bytearray has an additional set of operations that perform in-place changes. The 
str and bytes types also have methods for encoding and decoding text, 
respectively. 


. Non-ASCII Unicode characters can be coded in a string with both hex (\xNN) and 
Unicode (\uNNNN, \UNNNNNNNN) escapes. On some keyboards, some non-ASCII char- 
acters—certain Latin-1 characters, for example—can also be typed directly. 


. In 3.0, text-mode files assume their file content is Unicode text (even if it’s ASCII) 
and automatically decode when reading and encode when writing. With binary- 
mode files, bytes are transferred to and from the file unchanged. The contents of 
text-mode files are usually represented as str objects in your script, and the con- 
tents of binary files are represented as bytes (or bytearray) objects. Text-mode files 
also handle the BOM for certain encoding types and automatically translate end- 
of-line sequences to and from the single \n character on input and output unless 
this is explicitly disabled; binary-mode files do not perform either of these steps. 


. To read files encoded in a different encoding than the default for your platform, 
simply pass the name of the file’s encoding to the open built-in in 3.0 
(codecs. open() in 2.6); data will be decoded per the specified encoding when it is 
read from the file. You can also read in binary mode and manually decode the bytes 
to a string by giving an encoding name, but this involves extra work and is some- 
what error-prone for multibyte characters (you may accidentally read a partial 
character sequence). 


. To create a Unicode text file in a specific encoding format, pass the desired en- 
coding name to open in 3.0 (codecs.open() in 2.6); strings will be encoded per the 
desired encoding when they are written to the file. You can also manually encode 
a string to bytes and write it in binary mode, but this is usually extra work. 


. ASCII text is considered to be a kind of Unicode text, because its 7-bit range of 
values is a subset of most Unicode encodings. For example, valid ASCII text is also 
valid Latin-1 text (Latin-1 simply assigns the remaining possible values in an 8-bit 
byte to additional characters) and valid UTF-8 text (UTF-8 defines a variable-byte 
scheme for representing more characters, but ASCII characters are still represented 
with the same codes, in a single byte). 
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10. The impact of Python 3.0’s string types change depends upon the types of strings 
you use. For scripts that use simple ASCII text, there is probably no impact at all: 
the str string type works the same in 2.6 and 3.0 in this case. Moreover, although 
string-related tools in the standard library such as re, struct, pickle, and xml may 
technically use different types in 3.0 than in 2.6, the changes are largely irrelevant 
to most programs because 3.0’s str and bytes and 2.6’s str support almost iden- 
tical interfaces. If you process Unicode data, the toolset you need has simply moved 
from 2.6’s unicode and codecs.open() to 3.0’s str and open. If you deal with binary 
data files, you’ll need to deal with content as bytes objects; since they have a similar 
interface to 2.6 strings, though, the impact should again be minimal. 
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CHAPTER 37 
Managed Attributes 


This chapter expands on the attribute interception techniques introduced earlier, in- 
troduces another, and employs them in a handful of larger examples. Like everything 
in this part of the book, this chapter is classified as an advanced topic and optional 
reading, because most applications programmers don’t need to care about the material 
discussed here—they can fetch and set attributes on objects without concern for at- 
tribute implementations. Especially for tools builders, though, managing attribute ac- 
cess can be an important part of flexible APIs. 


Why Manage Attributes? 


Object attributes are central to most Python programs—they are where we often store 
information about the entities our scripts process. Normally, attributes are simply 
names for objects; a person’s name attribute, for example, might be a simple string, 
fetched and set with basic attribute syntax: 


person.name # Fetch attribute value 
person.name = value # Change attribute value 


In most cases, the attribute lives in the object itself, or is inherited from a class from 
which it derives. That basic model suffices for most programs you will write in your 
Python career. 


Sometimes, though, more flexibility is required. Suppose you’ve written a program to 
use a name attribute directly, but then your requirements change—for example, you 
decide that names should be validated with logic when set or mutated in some way 
when fetched. It’s straightforward to code methods to manage access to the attribute’s 
value (valid and transform are abstract here): 
class Person: 
def getName(self): 
if not valid(): 
raise TypeError('cannot fetch name’) 


else: 
return self.name.transform() 
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def setName(self, value): 
if not valid(value): 
raise TypeError('cannot change name’) 
else: 
self.name = transform(value) 


person = Person() 
person.getName() 
person. setName('value' ) 


However, this also requires changing all the places where names are used in the entire 
program—a possibly nontrivial task. Moreover, this approach requires the program to 
be aware of how values are exported: as simple names or called methods. If you begin 
with a method-based interface to data, clients are immune to changes; if you do not, 
they can become problematic. 


This issue can crop up more often than you might expect. The value of a cell in a 
spreadsheet-like program, for instance, might begin its life as a simple discrete value, 
but later mutate into an arbitrary calculation. Since an object’s interface should be 
flexible enough to support such future changes without breaking existing code, switch- 
ing to methods later is less than ideal. 


Inserting Code to Run on Attribute Access 


A better solution would allow you to run code automatically on attribute access, if 
needed. At various points in this book, we’ve met Python tools that allow our scripts 
to dynamically compute attribute values when fetching them and validate or change 
attribute values when storing them. In this chapter, were going to expand on the tools 
already introduced, explore other available tools, and study some larger use-case ex- 
amples in this domain. Specifically, this chapter presents: 


e The_ getattr_and_setattr_ methods, for routing undefined attribute fetches 
and all attribute assignments to generic handler methods. 


e The getattribute_ method, for routing all attribute fetches to a generic handler 
method in new-style classes in 2.6 and all classes in 3.0. 


e The property built-in, for routing specific attribute access to get and set handler 
functions, known as properties. 


° The descriptor protocol, for routing specific attribute accesses to instances of classes 
with arbitrary get and set handler methods. 


The first and third of these were briefly introduced in Part VI; the others are new topics 
introduced and covered here. 


As we'll see, all four techniques share goals to some degree, and it’s usually possible to 
code a given problem using any one of them. They do differ in some important ways, 
though. For example, the last two techniques listed here apply to specific attributes, 
whereas the first two are generic enough to be used by delegation-based classes that 
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must route arbitrary attributes to wrapped objects. As we’ll see, all four schemes also 
differ in both complexity and aesthetics, in ways you must see in action to judge for 
yourself. 


Besides studying the specifics behind the four attribute interception techniques listed 
in this section, this chapter also presents an opportunity to explore larger programs 
than we’ve seen elsewhere in this book. The CardHolder case study at the end, for ex- 
ample, should serve as a self-study example of larger classes in action. We’ll also be 
using some of the techniques outlined here in the next chapter to code decorators, so 
be sure you have at least a general understanding of these topics before you move on. 


Properties 


The property protocol allows us to route a specific attribute’s get and set operations to 
functions or methods we provide, enabling us to insert code to be run automatically 
on attribute access, intercept attribute deletions, and provide documentation for the 
attributes if desired. 


Properties are created with the property built-in and are assigned to class attributes, 
just like method functions. As such, they are inherited by subclasses and instances, like 
any other class attributes. Their access-interception functions are provided with the 
self instance argument, which grants access to state information and class attributes 
available on the subject instance. 


A property manages a single, specific attribute; although it can’t catch all attribute 
accesses generically, it allows us to control both fetch and assignment accesses and 
enables us to change an attribute from simple data to a computation freely, without 
breaking existing code. As we’ll see, properties are strongly related to descriptors; they 
are essentially a restricted form of them. 


The Basics 


A property is created by assigning the result of a built-in function to a class attribute: 


attribute = property(fget, fset, fdel, doc) 


None of this built-in’s arguments are required, and all default to None if not passed; 
such operations are not supported, and attempting them will raise an exception. When 
using them, we pass fget a function for intercepting attribute fetches, fset a function 
for assignments, and fdel a function for attribute deletions; the doc argument receives 
a documentation string for the attribute, if desired (otherwise the property copies the 
docstring of fget, if provided, which defaults to None). fget returns the computed at- 
tribute value, and fset and fdel return nothing (really, None). 


This built-in call returns a property object, which we assign to the name of the attribute 
to be managed in the class scope, where it will be inherited by every instance. 
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A First Example 


To demonstrate how this translates to working code, the following class uses a property 
to trace access to an attribute named name; the actual stored data is named _name so it 
does not clash with the property: 


class Person: # Use (object) in 2.6 
def _ init__(self, name): 
self. name = name 
def getName(self): 
print('fetch...') 
return self. name 
def setName(self, value): 
print('change...') 
self. name = value 
def delName(self): 
print('remove...') 
del self. name 
name = property(getName, setName, delName, "name property docs") 


bob = Person('Bob Smith’) # bob has a managed attribute 
print (bob.name) # Runs getName 

bob.name = ‘Robert Smith' # Runs setName 

print (bob.name) 

del bob.name # Runs delName 


print('-'*20) 


sue = Person('Sue Jones’) # sue inherits property too 
print (sue.name) 
print(Person.name._doc_) # Or help(Person.name) 


Properties are available in both 2.6 and 3.0, but they require new-style object derivation 
in 2.6 to work correctly for assignments—add object as a superclass here to run this 
in 2.6 (you can the superclass in 3.0 too, but it’s implied and not required). 


This particular property doesn’t do much—it simply intercepts and traces an 
attribute—but it serves to demonstrate the protocol. When this code is run, two in- 
stances inherit the property, just as they would any other attribute attached to their 
class. However, their attribute accesses are caught: 


fetch... 

Bob Smith 
change... 
fetch... 
Robert Smith 
remove... 


fetch... 
Sue Jones 
name property docs 


Like all class attributes, properties are inherited by both instances and lower subclasses. 
If we change our example as follows, for example: 


944 | Chapter 37: Managed Attributes 


class Super: 
...the original Person class code... 
name = property(getName, setName, delName, ‘name property docs') 


class Person(Super) : 
pass # Properties are inherited 


bob = Person('Bob Smith') 
... rest unchanged... 


the output is the same—the Person subclass inherits the name property from Super, and 
the bob instance gets it from Person. In terms of inheritance, properties work the same 
as normal methods; because they have access to the self instance argument, they can 
access instance state information like methods, as the next section demonstrates. 


Computed Attributes 


The example in the prior section simply traces attribute accesses. Usually, though, 
properties do much more—computing the value of an attribute dynamically when 
fetched, for example. The following example illustrates: 

class PropSquare: 


def _init_ (self, start): 
self.value = start 


def getX(self): # On attr fetch 
return self.value ** 2 
def setX(self, value): # On attr assign 
self.value = value 
X = property(getX, setX) # No delete or docs 
P = PropSquare(3) # 2 instances of class with property 
Q = PropSquare(32) # Each has different state information 
print(P.X) #32 
P.X = 4 
print(P.X) #42 
print(Q.X) #32 *2 


This class defines an attribute X that is accessed as though it were static data, but really 
runs code to compute its value when fetched. The effectis much like an implicit method 
call. When the code is run, the value is stored in the instance as state information, but 
each time we fetch it via the managed attribute, its value is automatically squared: 

9 


16 
1024 


Notice that we’ve made two different instances—because property methods automat- 
ically receive a self argument, they have access to the state information stored in in- 
stances. In our case, this mean the fetch computes the square of the subject instance’s 
data. 
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Coding Properties with Decorators 


Although we’re saving additional details until the next chapter, we introduced function 
decorator basics earlier, in Chapter 31. Recall that the function decorator syntax: 


@decorator 
def func(args): ... 


is automatically translated to this equivalent by Python, to rebind the function name 
to the result of the decorator callable: 


def func(args): ... 
func = decorator (func) 


Because of this mapping, it turns out that the property built-in can serve as a decorator, 
to define a function that will run automatically when an attribute is fetched: 


class Person: 
@property 
def name(self): ... # Rebinds: name = property(name) 


When run, the decorated method is automatically passed to the first argument of the 
property built-in. This is really just alternative syntax for creating a property and re- 
binding the attribute name manually: 


class Person: 
def name(self): ... 
name = property(name) 


As of Python 2.6, property objects also have getter, setter, and deleter methods that 
assign the corresponding property accessor methods and return a copy of the property 
itself. We can use these to specify components of properties by decorating normal 
methods too, though the getter component is usually filled in automatically by the act 
of creating the property itself: 


class Person: 
def _ init__(self, name): 
self. name = name 


@property 

def name(self): # name = property(name) 
"name property docs" 
print('fetch...') 
return self. name 


@name. setter 

def name(self, value): # name = name.setter(name) 
print('change...') 
self. name = value 


@name.deleter 

def name(self): # name = name.deleter(name) 
print('remove...') 
del self. name 
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bob = Person('Bob Smith’) # bob has a managed attribute 


print (bob.name) # Runs name getter (name 1) 
bob.name = ‘Robert Smith' # Runs name setter (name 2) 
print (bob.name) 

del bob.name # Runs name deleter (name 3) 


print('-'*20) 


sue = Person('Sue Jones') # sue inherits property too 
print (sue.name) 
print(Person.name._doc_) # Or help(Person.name) 


In fact, this code is equivalent to the first example in this section—decoration is just 
an alternative way to code properties in this case. When it’s run, the results are the same: 

fetch... 

Bob Smith 

change... 

fetch... 

Robert Smith 

remove... 


fetch... 
Sue Jones 
name property docs 


Compared to manual assignment of property results, in this case using decorators to 
code properties requires just three extra lines of code (a negligible difference). As is so 
often the case with alternative tools, the choice between the two techniques is largely 
subjective. 


Descriptors 


Descriptors provide an alternative way to intercept attribute access; they are strongly 
related to the properties discussed in the prior section. In fact, a property is a kind of 
descriptor—technically speaking, the property built-in is just a simplified way to create 
a specific type of descriptor that runs method functions on attribute accesses. 


Functionally speaking, the descriptor protocol allows us to route a specific attribute’s 
get and set operations to methods of a separate class object that we provide: they pro- 
vide a way to insert code to be run automatically on attribute access, and they allow us 
to intercept attribute deletions and provide documentation for the attributes if desired. 


Descriptors are created as independent classes, and they are assigned to class attributes 
just like method functions. Like any other class attribute, they are inherited by sub- 
classes and instances. Their access-interception methods are provided with both a 
self for the descriptor itself, and the instance of the client class. Because of this, they 
can retain and use state information of their own, as well as state information of the 
subject instance. For example, a descriptor may call methods available in the client 
class, as well as descriptor-specific methods it defines. 
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Like a property, a descriptor manages a single, specific attribute; although it can’t catch 
all attribute accesses generically, it provides control over both fetch and assignment 
accesses and allows us to change an attribute freely from simple data to a computation 
without breaking existing code. Properties really are just a convenient way to create a 
specific kind of descriptor, and as we shall see, they can be coded as descriptors directly. 


Whereas properties are fairly narrow in scope, descriptors provide a more general 
solution. For instance, because they are coded as normal classes, descriptors have their 
own state, may participate in descriptor inheritance hierarchies, can use composition 
to aggregate objects, and provide a natural structure for coding internal methods and 
attribute documentation strings. 


The Basics 


As mentioned previously, descriptors are coded as separate classes and provide spe- 
cially named accessor methods for the attribute access operations they wish to 
intercept—get, set, and deletion methods in the descriptor class are automatically run 
when the attribute assigned to the descriptor class instance is accessed in the corre- 
sponding way: 
class Descriptor: 
"docstring goes here" 


def _get_ (self, instance, owner): ... # Return attr value 
def set (self, instance, value): ... # Return nothing (None) 
def _delete_ (self, instance): ... # Return nothing (None) 


Classes with any of these methods are considered descriptors, and their methods are 
special when one of their instances is assigned to another class’s attribute—when the 
attribute is accessed, they are automatically invoked. If any of these methods are absent, 
it generally means that the corresponding type of access is not supported. Unlike with 
properties, however, omittinga__set__ allows the name to be redefined in an instance, 
thereby hiding the descriptor—to make an attribute read-only, you must define 
__set__ to catch assignments and raise an exception. 


Descriptor method arguments 


Before we code anything realistic, let’s take a brief look at some fundamentals. All three 
descriptor methods outlined in the prior section are passed both the descriptor class 
instance (self) and the instance of the client class to which the descriptor instance is 
attached (instance). 


The__get__ access method additionally receives an owner argument, specifying the class 
to which the descriptor instance is attached. Its instance argument is either the instance 
through which the attribute was accessed (for instance.attr), or None when the at- 
tribute is accessed through the owner class directly (for class.attr). The former of 
these generally computes a value for instance access, and the latter usually returns 
self if descriptor object access is supported. 
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For example, in the following, when X.attr is fetched, Python automatically runs the 
__get__ method of the Descriptor class to which the Subject.attr class attribute is 
assigned (as with properties, in Python 2.6 we must derive from object to use descrip- 
tors here; in 3.0 this is implied, but doesn’t hurt): 

>>> class Descriptor(object): 


def _get__(self, instance, owner): 
print(self, instance, owner, sep='\n') 


>>> class Subject: 
attr = Descriptor() # Descriptor instance is class attr 


>>> X = Subject() 


>>> X.attr 

<__main__.Descriptor object at 0x0281E690> 
<__main__.Subject object at 0x028289Bo> 
<class '_main_.Subject'> 


>>> Subject.attr 

<__main__.Descriptor object at 0x0281E690> 
None 
<class 


__main__.Subject'> 


Notice the arguments automatically passed in to the _ get method in the first at- 
tribute fetch—when X. attr is fetched, it’s as though the following translation occurs 
(though the Subject.attr here doesn’t invoke __get__ again): 


X.attr -> Descriptor. get _ (Subject.attr, X, Subject) 


The descriptor knows it is being accessed directly when its instance argument is None. 


Read-only descriptors 


As mentioned earlier, unlike with properties, with descriptors simply omitting the 
__set__ method isn’t enough to make an attribute read-only, because the descriptor 
name can be assigned to an instance. In the following, the attribute assignment to 
X.a stores a in the instance object X, thereby hiding the descriptor stored in class C: 


>>> class D: 
def _ get__(*args): print('get') 


>>> class C: 


a = D() 


a # Runs inherited descriptor __get__ 


>>> X.a = 99 # Stored on X, hiding C.a 


99 
>>> Llist(X.__dict__.keys()) 
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[‘a'] 

>>> Y= C() 

>>> Y.a # Y still inherits descriptor 
get 

>>> C.a 

get 


This is the way all instance attribute assignments work in Python, and it allows classes 
to selectively override class-level defaults in their instances. To make a descriptor-based 
attribute read-only, catch the assignment in the descriptor class and raise an exception 
to prevent attribute assignment—when assigning an attribute that is a descriptor, Py- 
thon effectively bypasses the normal instance-level assignment behavior and routes the 
operation to the descriptor object: 
>>> class D: 
def _get__(*args): print('get') 
def __set__(*args): raise AttributeError('cannot set') 


>>> class C: 


a = D() 
>> X = C() 
>>> X.a # Routed to C.a.__get__ 
get 
>>> X.a = 99 # Routed to C.a.__set__ 


AttributeError: cannot set 


Vs 
Also be careful not to confuse the descriptor __delete__ method with 
the general _del__ method. The former is called on attempts to delete 
2° the managed attribute name on an instance of the owner class; the latter 
` is the general instance destructor method, run when an instance of any 
kind of class is about to be garbage collected. __delete__ is more closely 
related to the __delattr__ generic attribute deletion method we’ll meet 
later in this chapter. See Chapter 29 for more on operator overloading 
methods. 


A First Example 


To see how this all comes together in more realistic code, let’s get started with the same 
first example we wrote for properties. The following defines a descriptor that intercepts 
access to an attribute named name in its clients. Its methods use their instance argument 
to access state information in the subject instance, where the name string is actually 
stored. Like properties, descriptors work properly only for new-style classes, so be sure 
to derive both classes in the following from object if you’re using 2.6: 
class Name: # Use (object) in 2.6 
"name descriptor docs" 
def _ get (self, instance, owner): 


print('fetch...') 
return instance. name 
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def _set_ (self, instance, value): 
print('change...') 
instance. name = value 

def _ delete (self, instance): 
print('remove...') 
del instance. name 


class Person: # Use (object) in 2.6 
def _ init__(self, name): 
self. name = name 


name = Name() # Assign descriptor to attr 
bob = Person('Bob Smith’) # bob has a managed attribute 
print (bob.name) # Runs Name.__get__ 
bob.name = ‘Robert Smith' # Runs Name.__set__ 
print (bob.name) 
del bob.name # Runs Name.__delete__ 


print('-'*20) 


sue = Person('Sue Jones’) # sue inherits descriptor too 
print (sue.name) 
print(Name. doc_) # Or help(Name) 


Notice in this code how we assign an instance of our descriptor class to a class at- 
tribute in the client class; because of this, it is inherited by all instances of the class, just 
like a class’s methods. Really, we must assign the descriptor to a class attribute like 
this—it won’t work if assigned to a self instance attribute instead. When the descrip- 
tors _get_ method is run, it is passed three objects to define its context: 


e self is the Name class instance. 
e instance is the Person class instance. 


© owner is the Person class. 


When this code is run the descriptor’s methods intercept accesses to the attribute, much 
like the property version. In fact, the output is the same again: 

fetch... 

Bob Smith 

change... 

fetch... 

Robert Smith 

remove... 


fetch... 
Sue Jones 
name descriptor docs 


Also like in the property example, our descriptor class instance is a class attribute and 
thus is inherited by all instances of the client class and any subclasses. If we change the 
Person class in our example to the following, for instance, the output of our script is 
the same: 
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class Super: 
def init__(self, name): 
self. name = name 
name = Name() 


class Person(Super): # Descriptors are inherited 
pass 


Also note that when a descriptor class is not useful outside the client class, it’s perfectly 
reasonable to embed the descriptor’s definition inside its client syntactically. Here’s 
what our example looks like if we use a nested class: 
class Person: 
def _ init__(self, name): 
self. name = name 


class Name: # Using a nested class 
"name descriptor docs" 
def _ get (self, instance, owner): 
print('fetch...') 
return instance. name 
def _set_ (self, instance, value): 
print(‘change...') 
instance. name = value 
def _ delete (self, instance): 
print('remove...') 
del instance._name 
name = Name() 


When coded this way, Name becomes a local variable in the scope of the Person class 
statement, such that it won’t clash with any names outside the class. This version works 
the same as the original—we’ve simply moved the descriptor class definition into the 
client class’s scope—but the last line of the testing code must change to fetch the doc- 
string from its new location: 


print(Person.Name._doc_) # Differs: not Name.__doc__ outside class 


Computed Attributes 


As was the case when using properties, our first descriptor example of the prior section 
didn’t do much—it simply printed trace messages for attribute accesses. In practice, 
descriptors can also be used to compute attribute values each time they are fetched. 
The following illustrates—it’s a rehash of the same example we coded for properties, 


which uses a descriptor to automatically square an attribute’s value each time it is 
fetched: 


class DescSquare: 


def init__(self, start): # Each desc has own state 
self.value = start 
def _ get_ (self, instance, owner): # On attr fetch 
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return self.value ** 2 
def _set_ (self, instance, value): # On attr assign 
self.value = value # No delete or docs 


class Client1: 
X = DescSquare(3) # Assign descriptor instance to class attr 


class Client2: 
X = DescSquare(32) # Another instance in another client class 
# Could also code 2 instances in same class 
c1 = Client1() 
c2 = Client2() 


print(c1.X) #32 
c1.X = 4 

print(c1.X) #42 
print(c2.X) # 32 2 


When run, the output of this example is the same as that of the original property-based 
version, but here a descriptor class object is intercepting the attribute accesses: 
9 


16 
1024 


Using State Information in Descriptors 


If you study the two descriptor examples we’ve written so far, you might notice that 
they get their information from different places—the first (the name attribute example) 
uses data stored on the client instance, and the second (the attribute squaring example) 
uses data attached to the descriptor object itself. In fact, descriptors can use both in- 
stance state and descriptor state, or any combination thereof: 


e Descriptor state is used to manage data internal to the workings of the descriptor. 


e Instance state records information related to and possibly created by the client 
class. 


Descriptor methods may use either, but descriptor state often makes it unnecessary to 
use special naming conventions to avoid name collisions for descriptor data stored on 
an instance. For example, the following descriptor attaches information to its own 
instance, so it doesn’t clash with that on the client class’s instance: 


class DescState: # Use descriptor state 

def _ init__(self, value): 
self.value = value 

def _ get__(self, instance, owner): # On attr fetch 
print('DescState get’) 
return self.value * 10 

def _set_ (self, instance, value): # On attr assign 
print('DescState set') 
self.value = value 


# Client class 
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class CalcAttrs: 
X = DescState(2) 
Y = 3 
def _ init__(self): 
self.Z = 4 


obj = CalcAttrs() 
print(obj.X, obj.Y, obj.Z) 


obj.X = 5 
obj.Y = 6 
obj.Z = 7 


print(obj.X, obj.Y, obj.Z) 


# Descriptor class attr 
# Class attr 


# Instance attr 


# X is computed, others are not 
# X assignment is intercepted 


This code’s value information lives only in the descriptor, so there won’t be a collision 
if the same name is used in the client’s instance. Notice that only the descriptor attribute 
is managed here—get and set accesses to X are intercepted, but accesses to Y and Z are 
not (Y is attached to the client class and Z to the instance). When this code is run, X is 


computed when fetched: 


DescState get 
20 3 4 
DescState set 
DescState get 
5067 


It’s also feasible for a descriptor to store or use an attribute attached to the client class’s 
instance, instead of itself. The descriptor in the following example assumes the instance 
has an attribute _Y attached by the client class, and uses it to compute the value of the 


attribute it represents: 


class InstState: 
def _ get (self, instance, owner): 
print('InstState get’) 
return instance. Y * 100 
def _set_ (self, instance, value): 
print('InstState set') 
instance. Y = value 


# Client class 


class CalcAttrs: 
X = DescState(2) 
Y = InstState() 
def _ init__(self): 
self. Y = 3 
self.Z = 4 


obj = CalcAttrs() 
print(obj.X, obj.Y, obj.Z) 


obj.X = 5 
obj.Y = 6 
obj.Z = 7 


print(obj.X, obj.Y, obj.Z) 


# Using instance state 


# Assume set by client class 


# Descriptor class attr 
# Descriptor class attr 


# Instance attr 


# Instance attr 


# X and Y are computed, Z is not 
# X and Y assignments intercepted 
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This time, X and Y are both assigned to descriptors and computed when fetched (X is 
assigned the descriptor of the prior example). The new descriptor here has no infor- 
mation itself, but it uses an attribute assumed to exist in the instance—that attribute 
is named _Y, to avoid collisions with the name of the descriptor itself. When this version 
is run the results are similar, but a second attribute is managed, using state that lives 
in the instance instead of the descriptor: 


DescState get 
InstState get 
20 300 4 

DescState set 
InstState set 
DescState get 
InstState get 
50 600 7 


Both descriptor and instance state have roles. In fact, this is a general advantage that 
descriptors have over properties—because they have state of their own, they can easily 
retain data internally, without adding it to the namespace of the client instance object. 


How Properties and Descriptors Relate 


As mentioned earlier, properties and descriptors are strongly related—the property 
built-in is just a convenient way to create a descriptor. Now that you know how both 
work, you should also be able to see that it’s possible to simulate the property built-in 
with a descriptor class like the following: 


class Property: 
def init__(self, fget=None, fset=None, fdel=None, doc=None): 
self.fget = fget 
self.fset = fset 
self.fdel = fdel # Save unbound methods 
self.__doc_ = doc # or other callables 


def _ get (self, instance, instancetype=None): 
if instance is None: 
return self 
if self.fget is None: 
raise AttributeError("can't get attribute") 
return self.fget(instance) # Pass instance to self 
# in property accessors 
def _set_ (self, instance, value): 
if self.fset is None: 
raise AttributeError("can't set attribute") 
self.fset(instance, value) 


def _ delete (self, instance): 
if self.fdel is None: 
raise AttributeError("can't delete attribute") 
self.fdel(instance) 


class Person: 
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def getName(self): ... 
def setName(self, value): ... 
name = Property(getName, setName) # Use like property() 


This Property class catches attribute accesses with the descriptor protocol and routes 
requests to functions or methods passed in and saved in descriptor state when the class 
is created. Attribute fetches, for example, are routed from the Person class, to the 
Property class’s__get__ method, and back to the Person class’s getName. With descrip- 
tors, this “just works.” 


Note that this descriptor class equivalent only handles basic property usage, though; 
to use @ decorator syntax to also specify set and delete operations, our Property class 
would also have to be extended with setter and deleter methods, which would save 
the decorated accessor function and return the property object (self should suffice). 
Since the property built-in already does this, we’ll omit a formal coding of this extension 
here. 


Also note that descriptors are used to implement Python’s _ slots; instance attribute 
dictionaries are avoided by intercepting slot names with descriptors stored at the class 
level. See Chapter 31 for more on slots. 


Vs, 

i 
ai In Chapter 38, we’ll also make use of descriptors to implement function 
a a decorators that apply to both functions and methods. As you'll see there, 
~~ aS because descriptors receive both descriptor and subject class instances 


they work well in this role, though nested functions are usually a simpler 
solution. 


__getattr__and___getattribute__ 


So far, we’ve studied properties and descriptors—tools for managing specific attributes. 
The _ getattr_ and _ getattribute_ operator overloading methods provide still 
other ways to intercept attribute fetches for class instances. Like properties and de- 
scriptors, they allow us to insert code to be run automatically when attributes are ac- 
cessed; as we’ll see, though, these two methods can be used in more general ways. 


Attribute fetch interception comes in two flavors, coded with two different methods: 


e —_getattr_ is run for undefined attributes—that is, attributes not stored on an 
instance or inherited from one of its classes. 


e —_getattribute__ is run for every attribute, so when using it you must be cautious 
to avoid recursive loops by passing attribute accesses to a superclass. 


We met the former of these in Chapter 29; it’s available for all Python versions. The 
latter of these is available for new-style classes in 2.6, and for all (implicitly new-style) 
classes in 3.0. These two methods are representatives of a set of attribute interception 
methods that also includes _setattr_ and _delattr__. Because these methods have 
similar roles, we will generally treat them as a single topic here. 
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Unlike properties and descriptors, these methods are part of Python’s operator over- 
loading protocol—specially named methods of a class, inherited by subclasses, and run 
automatically when instances are used in the implied built-in operation. Like all meth- 
ods of a class, they each receive a first self argument when called, giving access to any 
required instance state information or other methods of the class. 


The _getattr__and__ getattribute_ methods are also more generic than properties 
and descriptors—they can be used to intercept access to any (or even all) instance 
attribute fetches, not just the specific name to which they are assigned. Because of this, 
these two methods are well suited to general delegation-based coding patterns—they 
can be used to implement wrapper objects that manage all attribute accesses for an 
embedded object. By contrast, we must define one property or descriptor for every 
attribute we wish to intercept. 


Finally, these two methods are more narrowly focused than the alternatives we consid- 
ered earlier: they intercept attribute fetches only, not assignments. To also catch at- 
tribute changes by assignment, we must code a _ setattr__ method—an operator 
overloading method run for every attribute fetch, which must take care to avoid recur- 
sive loops by routing attribute assignments through the instance namespace dictionary. 


Although much less common, we can also code a _ delattr__ overloading method 
(which must avoid looping in the same way) to intercept attribute deletions. By con- 
trast, properties and descriptors catch get, set, and delete operations by design. 


Most of these operator overloading methods were introduced earlier in the book; here, 
we'll expand on their usage and study their roles in larger contexts. 


The Basics 


_getattr__ and _setattr__ were introduced in Chapters 29 and 31, and 
__getattribute__ was mentioned briefly in Chapter 31. In short, if a class defines or 
inherits the following methods, they will be run automatically when an instance is used 
in the context described by the comments to the right: 

def _ getattr_ (self, name): # On undefined attribute fetch [obj.name] 

def _ getattribute_ (self, name): # On all attribute fetch [obj.name] 


def _setattr_ (self, name, value): # On all attribute assignment [obj.name=value] 
def _delattr_ (self, name): # On all attribute deletion [del obj.name] 


In all of these, self is the subject instance object as usual, name is the string name of 
the attribute being accessed, and value is the object being assigned to the attribute. The 
two get methods normally return an attribute’s value, and the other two return nothing 
(None). For example, to catch every attribute fetch, we can use either of the first two 
methods above, and to catch every attribute assignment we can use the third: 
class Catcher: 
def _ getattr_(self, name): 
print('Get:', name) 
def _setattr_(self, name, value): 
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print('Set:', name, value) 


X = Catcher() 


X. job # Prints "Get: job" 
X. pay # Prints "Get: pay" 
X.pay = 99 # Prints "Set: pay 99" 


Such a coding structure can be used to implement the delegation design pattern we met 
earlier, in Chapter 30. Because all attribute are routed to our interception methods 
generically, we can validate and pass them along to embedded, managed objects. The 
following class (borrowed from Chapter 30), for example, traces every attribute fetch 
made to another object passed to the wrapper class: 


class Wrapper: 
def init__(self, object): 


self.wrapped = object # Save object 
def _getattr_ (self, attrname): 
print('Trace:', attrname) # Trace fetch 


return getattr(self.wrapped, attrname) # Delegate fetch 


There is no such analog for properties and descriptors, short of coding accessors for 
every possible attribute in every possibly wrapped object. 


Avoiding loops in attribute interception methods 


These methods are generally straightforward to use; their only complex part is the 
potential for looping (a.k.a. recursing). Because _ getattr__ is called for undefined 
attributes only, it can freely fetch other attributes within its own code. However, be- 
cause _getattribute and _ setattr__are run for all attributes, their code needs to 
be careful when accessing other attributes to avoid calling themselves again and trig- 
gering a recursive loop. 


For example, another attribute fetch run inside a __getattribute__ method’s code will 
trigger __getattribute__ again, and the code will loop until memory is exhausted: 


def _ getattribute_ (self, name): 
x = self.other # LOOPS! 


To work around this, route the fetch through a higher superclass instead to skip this 
level’s version—the object class is always a superclass, and it serves well in this role: 
def _ getattribute_ (self, name): 
x = object. getattribute_ (self, ‘other') # Force higher to avoid me 
For _setattr_, the situation is similar; assigning any attribute inside this method 
triggers _setattr__ again and creates a similar loop: 


def _setattr_(self, name, value): 
self.other = value # LOOPS! 


To work around this problem, assign the attribute as a key in the instance’s __dict__ 
namespace dictionary instead. This avoids direct attribute assignment: 
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def _setattr_ (self, name, value): 
self. dict__['other'] = value # Use atttr dict to avoid me 


Although it’s a less common approach, __setattr__ can also pass its own attribute 
assignments to a higher superclass to avoid looping, just like __getattribute_: 


def _setattr_ (self, name, value): 
object.__setattr_ (self, ‘other', value) # Force higher to avoid me 


By contrast, though, we cannot use the _dict_ trick to avoid loops in 
_ getattribute_: 


def _ getattribute_ (self, name): 
x = self. dict__['other'] # LOOPS! 


Fetching the _ dict__ attribute itself triggers _ getattribute__ again, causing a recur- 
sive loop. Strange but true! 


The _delattr__ method is rarely used in practice, but when it is, it is called for every 
attribute deletion (just as _setattr__ is called for every attribute assignment). There- 
fore, you must take care to avoid loops when deleting attributes, by using the same 
techniques: namespace dictionaries or superclass method calls. 


A First Example 


All this is not nearly as complicated as the prior section may have implied. To see how 
to put these ideas to work, here is the same first example we used for properties and 
descriptors in action again, this time implemented with attribute operator overloading 
methods. Because these methods are so generic, we test attribute names here to know 
when a managed attribute is being accessed; others are allowed to pass normally: 


class Person: 


def _ init__(self, name): # On [Person()] 
self. name = name # Triggers __setattr__! 
def _ getattr_(self, attr): # On [obj.undefined] 
if attr == ‘name’: # Intercept name: not stored 
print('fetch...') 
return self._name # Does not loop: real attr 
else: # Others are errors 


raise AttributeError(attr) 


def _setattr_(self, attr, value): # On [obj.any = value] 
if attr == ‘name’: 
print(‘change...') 
attr = '_name' # Set internal name 
self. dict_ [attr] = value # Avoid looping here 
def _delattr_(self, attr): # On [del obj.any] 
if attr == ‘name’: 
print('remove...') 
attr = '_name' # Avoid looping here too 
del self. dict_ [attr] # but much less common 
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bob = Person('Bob Smith’) # bob has a managed attribute 


print (bob.name) # Runs __getattr__ 
bob.name = ‘Robert Smith' # Runs __setattr__ 
print (bob.name) 

del bob.name # Runs __delattr__ 


print('-'*20) 


sue = Person('Sue Jones’) # sue inherits property too 
print (sue.name) 
#print(Person.name. doc_) # No equivalent here 


Notice that the attribute assignment in the _init__ constructor triggers __setattr__ 
too—this method catches every attribute assignment, even those within the class itself. 
When this code is run, the same output is produced, but this time it’s the result of 
Python’s normal operator overloading mechanism and our attribute interception 
methods: 

fetch... 

Bob Smith 

change... 

fetch... 


Robert Smith 
remove... 


fetch... 
Sue Jones 


Also note that, unlike with properties and descriptors, there’s no direct notion of spec- 
ifying documentation for our attribute here; managed attributes exist within the code 
of our interception methods, not as distinct objects. 


To achieve exactly the same results with _ getattribute_, replace getattr__in the 
example with the following; because it catches all attribute fetches, this version must 
be careful to avoid looping by passing new fetches to a superclass, and it can’t generally 
assume unknown names are errors: 


# Replace __getattr__ with this 


def _ getattribute (self, attr): # On [obj.any] 
if attr == 'name': # Intercept all names 
print('fetch...') 
attr = '_name' # Map to internal name 


return object.__getattribute_ (self, attr) # Avoid looping here 


This example is equivalent to that coded for properties and descriptors, but it’s a bit 
artificial, and it doesn’t really highlight these tools in practice. Because they are generic, 
__getattr__ and _ getattribute__ are probably more commonly used in delegation- 
base code (as sketched earlier), where attribute access is validated and routed to an 
embedded object. Where just a single attribute must be managed, properties and de- 
scriptors might do as well or better. 
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Computed Attributes 


As before, our prior example doesn’t really do anything but trace attribute fetches; it’s 
not much more work to compute an attribute’s value when fetched. As for properties 


and descriptors, the following creates a virtual attribute X that runs a calculation when 
fetched: 


class AttrSquare: 
def init__(self, start): 


self.value = start # Triggers __setattr__! 
def _ getattr_(self, attr): # On undefined attr fetch 
if attr == 'X': 
return self.value ** 2 # value is not undefined 
else: 


raise AttributeError(attr) 


def _setattr_(self, attr, value): # On all attr assignments 
if attr == 'X': 
attr = 'value' 
self. dict [attr] = value 


A = AttrSquare(3) # 2 instances of class with overloading 
B = AttrSquare(32) # Each has different state information 
print(A.X) #32 

A.X = 4 

print(A.X) #42 

print(B.X) #32 *2 


Running this code results in the same output that we got earlier when using properties 
and descriptors, but this script’s mechanics are based on generic attribute interception 
methods: 

9 


16 
1024 


As before, we can achieve the same effect with _ getattribute_ instead of 
__getattr_; the following replaces the fetch method with a _ getattribute_ and 
changesthe _setattr__ assignment method to avoid looping by using direct superclass 
method calls instead of _ dict__ keys: 


class AttrSquare: 
def init__(self, start): 


self.value = start # Triggers __setattr__! 
def _ getattribute_ (self, attr): # On all attr fetches 
if attr == 'X': 
return self.value ** 2 # Triggers __getattribute__ again! 
else: 


return object. _getattribute_ (self, attr) 


def _setattr_(self, attr, value): # On all attr assignments 
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if attr == 'X': 
attr = 'value' 
object.__setattr_(self, attr, value) 


When this version is run, the results are the same again. Notice the implicit routing 
going on in inside this class’s methods: 


e self.value=start inside the constructor triggers _setattr__ 


e self.value inside _getattribute_ triggers _getattribute__ again 


Infact, _getattribute__ is run twice each time we fetch attribute X. This doesn’t hap- 
penin the _getattr__ version, because the value attribute is not undefined. If you care 
about speed and want to avoid this, change _ getattribute__ to use the superclass to 
fetch value as well: 
def _ getattribute (self, attr): 
if attr == 'X': 
return object. __getattribute_ (self, 'value') ** 2 


Of course, this still incurs a call to the superclass method, but not an additional recur- 
sive call before we get there. Add print calls to these methods to trace how and when 
they run. 


__getattr__and___getattribute__ Compared 


To summarize the coding differences between _ getattr__and _ getattribute_, the 
following example uses both to implement three attributes—attr1 is a class attribute, 


attr2 is an instance attribute, and attr3 is a virtual managed attribute computed when 
fetched: 


class GetAttr: 

attri = 1 
def init__(self): 
self.attr2 = 2 


def _getattr_ (self, attr): # On undefined attrs only 
print('get: ' + attr) # Not attr1: inherited from class 
return 3 # Not attr2: stored on instance 


X = GetAttr() 

print(X.attr1) 
print (X.attr2) 
print(X.attr3) 


print('-'*40) 


class GetAttribute(object): # (object) needed in 2.6 only 
attri = 1 
def _ init__(self): 
self.attr2 = 2 


def _ getattribute_ (self, attr): # On all attr fetches 
print('get: ' + attr) # Use superclass to avoid looping here 
if attr == ‘attr3': 
return 3 
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else: 
return object. _getattribute_ (self, attr) 


X = GetAttribute() 
print (X.attr1) 
print (X.attr2) 
print (X.attr3) 


When run, the _ getattr__ version intercepts only attr3 accesses, because it is unde- 
fined. The _ getattribute__ version, on the other hand, intercepts all attribute fetches 
and must route those it does not manage to the superclass fetcher to avoid loops: 


get: attr2 


get: attr3 


Although _ getattribute__cancatch more attribute fetches than _getattr__, in prac- 
tice they are often just variations on a theme—if attributes are not physically stored, 
the two have the same effect. 


Management Techniques Compared 


To summarize the coding differences in all four attribute management schemes we’ve 
seen in this chapter, lets quickly step through a more comprehensive 
computed-attribute example using each technique. The following version uses prop- 
erties to intercept and calculate attributes named square and cube. Notice how their 
base values are stored in names that begin with an underscore, so they don’t clash with 
the names of the properties themselves: 


# 2 dynamically computed attributes with properties 


class Powers: 
def _ init__(self, square, cube): 
self. square = square # _square is the base value 
self. cube = cube # square is the property name 


def getSquare(self): 
return self. square ** 2 
def setSquare(self, value): 
self. square = value 
square = property(getSquare, setSquare) 


def getCube(self): 
return self. cube ** 3 
cube = property(getCube) 
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X = Powers(3, 4) 


print (X.square) #3°2=9 
print (X.cube) #43 = 64 
X.square = 5 

print(X.square) #5%2=25 


To do the same with descriptors, we define the attributes with complete classes. Note 
that these descriptors store base values as instance state, so they must use leading un- 
derscores again so as not to clash with the names of descriptors (as we’ll see in the final 
example of this chapter, we could avoid this renaming requirement by storing base 
values as descriptor state instead): 


# Same, but with descriptors 


class DescSquare: 
def _ get (self, instance, owner): 
return instance. square ** 2 
def _set_ (self, instance, value): 
instance. square = value 


class DescCube: 
def _ get (self, instance, owner): 
return instance. cube ** 3 


class Powers: # Use (object) in 2.6 
square = DescSquare() 
cube = DescCube() 
def _ init__(self, square, cube): 
self. square = square # "self.square = square" works too, 
self. cube = cube # because it triggers desc __set__! 


X = Powers(3, 4) 


print (X.square) #3°2=9 
print (X.cube) #43 =64 
X.square = 5 

print (X.square) #5°2=25 


To achieve the same result with _ getattr__ fetch interception, we again store base 
values with underscore-prefixed names so that accesses to managed names are unde- 
fined and thus invoke our method; we also need to code a __setattrr__ to intercept 
assignments, and take care to avoid its potential for looping: 


# Same, but with generic __getattr__ undefined attribute interception 
class Powers: 
def _ init__(self, square, cube): 


self. square = square 
self. cube = cube 


def _ getattr_(self, name): 


if name == ‘square’: 
return self. square ** 2 
elif name == ‘cube’: 


return self. cube ** 3 
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else: 
raise TypeError('unknown attr:' + name) 


def _setattr_(self, name, value): 
if name == ‘square’: 
self. _dict_[' square'] = value 
else: 
self. _dict__[name] = value 


X = Powers(3, 4) 


print(X.square) #3°2=9 
print (X.cube) #4°3=64 
X.square = 5 

print(X.square) #5°%2=25 


The final option, coding this with __getattribute_, is similar to the prior version. 
Because we catch every attribute now, though, we must route base value fetches to a 
superclass to avoid looping: 


# Same, but with generic __getattribute__all attribute interception 


class Powers: 
def _ init__(self, square, cube): 
self. square = square 
self. cube = cube 
def _ getattribute_ (self, name): 
if name == ‘square’: 
return object. _getattribute_ (self, 
elif name == 'cube': 
return object. _getattribute_ (self, '_cube') ** 3 
else: 
return object. _getattribute_ (self, name) 
def _setattr_(self, name, value): 
if name == ‘square’: 
self. dict_[' square'] = value 
else: 
self. dict__[name] = value 


_square') ** 2 


X = Powers(3, 4) 


print(X.square) #3°2=9 
print (X.cube) #43 = 64 
X.square = 5 

print(X.square) #5%2=25 


As you can see, each technique takes a different form in code, but all four produce the 
same result when run: 
9 


64 
25 


For more on how these alternatives compare, and other coding options, stay tuned for 
a more realistic application of them in the attribute validation example in the section 
“Example: Attribute Validations” on page 973. First, though, we need to study a 
pitfall associated with two of these tools. 
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Intercepting Built-in Operation Attributes 


When I introduced _ getattr__and__ getattribute_, I stated that they intercept un- 
defined and all attribute fetches, respectively, which makes them ideal for delegation- 
based coding patterns. While this is true for normally named attributes, their behavior 
needs some additional clarification: for method-name attributes implicitly fetched by 
built-in operations, these methods may not be run at all. This means that operator 
overloading method calls cannot be delegated to wrapped objects unless wrapper 
classes somehow redefine these methods themselves. 


For example, attribute fetches for the _str_, _add_,and _ getitem_ methods run 
implicitly by printing, + expressions, and indexing, respectively, are not routed to the 
generic attribute interception methods in 3.0. Specifically: 


e In Python 3.0, neither _getattr__nor__getattribute__isrun for such attributes. 


e In Python 2.6, _getattr__ is run for such attributes if they are undefined in the 
class. 


e In Python 2.6, _getattribute__ is available for new-style classes only and works 
as it does in 3.0. 


In other words, in Python 3.0 classes (and 2.6 new-style classes), there is no direct way 
to generically intercept built-in operations like printing and addition. In Python 2.X, 
the methods such operations invoke are looked up at runtime in instances, like all other 
attributes; in Python 3.0 such methods are looked up in classes instead. 


This change makes delegation-based coding patterns more complex in 3.0, since they 
cannot generically intercept operator overloading method calls and route them to an 
embedded object. This is not a showstopper—wrapper classes can work around this 
constraint by redefining all relevant operator overloading methods in the wrapper itself, 
in order to delegate calls. These extra methods can be added either manually, with 
tools, or by definition in and inheritance from common superclasses. This does, how- 
ever, make wrappers more work than they used to be when operator overloading 
methods are a part of a wrapped object’s interface. 


Keep in mind that this issue applies only to__getattr__and__getattribute_. Because 
properties and descriptors are defined for specific attributes only, they don’t really 
apply to delegation-based classes at all—a single property or descriptor cannot be used 
to intercept arbitrary attributes. Moreover, a class that defines both operator overload- 
ing methods and attribute interception will work correctly, regardless of the type of 
attribute interception defined. Our concern here is only with classes that do not have 
operator overloading methods defined, but try to intercept them generically. 


Consider the following example, the file getattr.py, which tests various attribute 
types and built-in operations on instances of classes containing _ getattr__ and 
__getattribute_ methods: 
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class GetAttr: 


eggs = 88 

def _ init__(self): 
self.spam = 77 

def _len_ (self): 


# eggs stored on class, spam on instance 


# len here, else __getattr__ called with __len__ 


print('__len_: 42') 


return 42 


def _getattr_ (self, attr): # Provide __str__ if asked, else dummy func 
print('getattr: ' + attr) 


if attr == '_ str_': 


return lambda *args: '‘[Getattr str]' 


else: 


return lambda *args: None 


class GetAttribute(object): # object required in 2.6, implied in 3.0 


for 


eggs = 88 

def init__(self): 
self.spam = 77 

def _len_ (self): 


# In 2.6 all are isinstance(object) auto 
# But must derive to get new-style tools, 
# incl __getattribute__, some __X__ defaults 


print('_len_: 42') 


return 42 


def _ getattribute (self, attr): 
print('getattribute: ' + attr) 


if attr == ' str_': 


return lambda *args: '[GetAttribute str]' 


else: 


return lambda *args: None 


Class in GetAttr, GetAttribute: 


print('\n' + Class. __ 


X = Class() 
X.eggs 
X.spam 
X.other 
len(X) 


try: 
x[0] 
except: 
print('fail []') 


try: 
X + 99 
except: 
print('fail +') 


try: 

x() 
except: 

print('fail ()') 
X.__call__() 


print(X.__str_()) 
print(X) 


name__.ljust(50, '=')) 


# Class attr 

# Instance attr 

# Missing attr 

# _len__ defined explicitly 


# New-styles must support [], +, call directly: redefine 
# _getitem_? 


# _add_? 


# _call_? (implicit via built-in) 


# _call_? (explicit, not inherited) 


# __str__? (explicit, inherited from type) 
# __str__? (implicit via built-in) 
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When run under Python 2.6, _ getattr__ does receive a variety of implicit attribute 
fetches for built-in operations, because Python looks up such attributes in instances 
normally. Conversely, _getattribute__ is not run for any of the operator overloading 
names, because such names are looked up in classes only: 


C:\misc> c:\python26\python getattr.py 


GetAttr=========================================== 
getattr: other 
_len_: 42 

getattr: _getitem__ 
getattr: __ coerce __ 
getattr: — add_ 
getattr: — call __ 
getattr: — call __ 
getattr: — str_ 
[Getattr str] 
getattr: — str_ 
[Getattr str] 


GetAttribute====================================== 
getattribute: eggs 

getattribute: spam 

getattribute: other 

_len_: 42 

fail [] 

fail + 

fail () 

getattribute: _call_ 

getattribute: _str_ 

[GetAttribute str] 

<_main__.GetAttribute object at 0x025EA1D0> 


Note how _getattr__ intercepts both implicit and explicit fetches of _call__ and 
__str__in2.6here.Bycontrast, _getattribute__ fails to catch implicit fetches of either 
attribute name for built-in operations. 


Really, the _ getattribute__ case is the same in 2.6 as it is in 3.0, because in 2.6 classes 
must be made new-style by deriving from object to use this method. This code’s 
object derivation is optional in 3.0 because all classes are new-style. 


When run under Python 3.0, though, results for _getattr__ differ—none of the im- 
plicitly run operator overloading methods trigger either attribute interception method 
when their attributes are fetched by built-in operations. Python 3.0 skips the normal 
instance lookup mechanism when resolving such names: 


C:\misc> c:\python30\python getattr.py 


GetAttr=======s==s==sssssssssssssssSsssssssSsssss=== 
getattr: other 

_len_: 42 

fail [] 

fail + 

fail () 
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getattr: call __ 
<__main__.GetAttr object at 0x025D17FO> 
<__main__.GetAttr object at 0x025D17FO> 


GetAttribute====================================== 
getattribute: eggs 

getattribute: spam 

getattribute: other 

_len_: 42 

fail [] 

fail + 

fail () 

getattribute: _call_ 

getattribute: _str_ 

[GetAttribute str] 

<_main__.GetAttribute object at 0x025D1870> 


We can trace these outputs back to prints in the script to see how this works: 


e _str__ access fails to be caught twice by __getattr__ in 3.0: once for the built-in 
print, and once for explicit fetches because a default is inherited from the class 
(really, from the built-in object, which is a superclass to every class). 


e _str_ fails to be caught only once by the _ getattribute_ catchall, during the 
built-in print operation; explicit fetches bypass the inherited version. 


e —_call__fails to be caught in both schemes in 3.0 for built-in call expressions, but 
it is intercepted by both when fetched explicitly; unlike with __str__, there is no 
inherited _call__ default to defeat __getattr_. 


e —__len_ iscaught by both classes, simply because it is an explicitly defined method 
in the classes themselves—its name it is not routed to either _getattr_ or __ get 
attribute in 3.0 if we delete the class’s__ len methods. 


e All other built-in operations fail to be intercepted by both schemes in 3.0. 


Again, the net effect is that operator overloading methods implicitly run by built-in 
operations are never routed through either attribute interception method in 3.0: Python 
3.0 searches for such attributes in classes and skips instance lookup entirely. 


This makes delegation-based wrapper classes more difficult to code in 3.0—if wrapped 
classes may contain operator overloading methods, those methods must be redefined 
redundantly in the wrapper class in order to delegate to the wrapped object. In general 
delegation tools, this can add many extra methods. 


Of course, the addition of such methods can be partly automated by tools that augment 
classes with new methods (the class decorators and metaclasses of the next two chapters 
might help here). Moreover, a superclass might be able to define all these extra methods 
once, for inheritance in delegation-based classes. Still, delegation coding patterns re- 
quire extra work in 3.0. 


For a more realistic illustration of this phenomenon as well as its workaround, see the 
Private decorator example in the following chapter. There, we’ll see that it’s also 
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possible to insert a __ getattribute__ in the client class to retain its original type, al- 
though this method still won’t be called for operator overloading methods; printing 
stillrunsa__ str defined in such a class directly, for example, instead of routing the 
request through _ getattribute_. 


As another example, the next section resurrects our class tutorial example. Now that 
you understand how attribute interception works, I’ll be able to explain one of its 
stranger bits. 


va, 

SI For an example of this 3.0 change at work in Python itself, see the dis- 
aS cussion of the 3.0 os.popen object in Chapter 14. Because it is imple- 
ne mented with a wrapper that uses _ getattr__ to delegate attribute 


fetches to an embedded object, it does not intercept the next (X) built- 
in iterator function in Python 3.0, which is defined to run __next_. It 
does, however, intercept and delegate explicit X.__next__() calls, be- 
cause these are not routed through the built-in and are not inherited 
from a superclass like _str__is. 


This is equivalent to _call__ in our example—implicit calls for built- 
ins do not trigger _getattr__, but explicit calls to names not inherited 
from the class type do. In other words, this change impacts not only our 
delegators, but also those in the Python standard library! Given the 
scope of this change, it’s possible that this behavior may evolve in the 
future, so be sure to verify this issue in later releases. 


Delegation-Based Managers Revisited 


The object-oriented tutorial of Chapter 27 presented a Manager class that used object 
embedding and method delegation to customize its superclass, rather than inheritance. 
Here is the code again for reference, with some irrelevant testing removed: 


class Person: 
def _ init__(self, name, job=None, pay=0): 
self.name = name 
self.job = job 
self.pay = pay 
def lastName(self): 
return self.name.split()[-1] 
def giveRaise(self, percent): 
self.pay = int(self.pay * (1 + percent)) 
def _str_ (self): 
return '[Person: %s, %s]' % (self.name, self.pay) 


class Manager: 
def _init_ (self, name, pay): 


self.person = Person(name, 'mgr', pay) # Embed a Person object 
def giveRaise(self, percent, bonus=.10): 

self.person.giveRaise(percent + bonus) # Intercept and delegate 
def _ getattr_(self, attr): 

return getattr(self.person, attr) # Delegate all other attrs 
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def _ str (self): 
return str(self.person) # Must overload again (in 3.0) 
if _name_ == '' main_': 
sue = Person('Sue Jones', job='dev', pay=100000) 
print(sue.lastName()) 
sue. giveRaise(.10) 


print (sue) 

tom = Manager('Tom Jones', 50000) # Manager.__init__ 

print (tom. lastName()) # Manager.__getattr__ -> Person.lastName 
tom.giveRaise(.10) # Manager.giveRaise -> Person.giveRaise 
print (tom) # Manager.__str__ -> Person.__str__ 


Comments at the end of this file show which methods are invoked fora line’s operation. 
In particular, notice how lastName calls are undefined in Manager, and thus are routed 
into the generic _ getattr__ and from there on to the embedded Person object. Here 
is the script’s output—Sue receives a 10% raise from Person, but Tom gets 20% because 
giveRaise is customized in Manager: 

C:\misc> c:\python30\python getattr.py 

Jones 

[Person: Sue Jones, 110000] 


Jones 
[Person: Tom Jones, 60000] 


By contrast, though, notice what occurs when we print a Manager at the end of the script: 
the wrapper class’s__str__ is invoked, and it delegates to the embedded Person object’s 
_str_. With that in mind, watch what happens if we delete the Manager. str__ 
method in this code: 


# Delete the Manager __str__ method 


class Manager: 
def init__(self, name, pay): 


self.person = Person(name, 'mgr', pay) # Embed a Person object 
def giveRaise(self, percent, bonus=.10): 

self.person.giveRaise(percent + bonus) # Intercept and delegate 
def _ getattr_(self, attr): 

return getattr(self.person, attr) # Delegate all other attrs 


Now printing does not route its attribute fetch through the generic __getattr__ inter- 
ceptor under Python 3.0 for Manager objects. Instead, a default __str__ display method 
inherited from the class’s implicit object superclass is looked up and run (sue still prints 
correctly, because Person has an explicit __str__): 

C:\misc> c:\python30\python person. py 

Jones 

[Person: Sue Jones, 110000] 


Jones 
<__main__.Manager object at 0x02A5AE30> 


Curiously, running without a __str__ like this does trigger __getattr__ in Python 2.6, 
because operator overloading attributes are routed through this method, and classes 
do not inherit a default for _str__ 
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C:\misc> c:\python26\python person. py 
Jones 

[Person: Sue Jones, 110000] 

Jones 

[Person: Tom Jones, 60000] 


Switching to _ getattribute__ won’t help 3.0 here either—like __getattr_, it is not 
run for operator overloading attributes implied by built-in operations in either Python 
2.6 or 3.0: 


# Replace __getattr_ with __getattribute__ 


class Manager: # Use (object) in 2.6 
def _ init__(self, name, pay): 
self.person = Person(name, 'mgr', pay) # Embed a Person object 
def giveRaise(self, percent, bonus=.10): 
self.person.giveRaise(percent + bonus) # Intercept and delegate 


def _ getattribute (self, attr): 
print('**', attr) 
if attr in ['person', 'giveRaise']: 
return object. getattribute_ (self, attr) | # Fetch my attrs 
else: 
return getattr(self.person, attr) # Delegate all others 


Regardless of which attribute interception method is used in 3.0, we still must include 
a redefined _str__ in Manager (as shown above) in order to intercept printing opera- 
tions and route them to the embedded Person object: 


C:\misc> c:\python30\python person. py 
Jones 

[Person: Sue Jones, 110000] 

** lastName 

** person 

Jones 

** giveRaise 

** person 

<__main__.Manager object at 0x028E0590> 


Notice that __getattribute__ gets called twice here for methods—once for the method 
name, and again for the self. person embedded object fetch. We could avoid that with 
a different coding, but we would still have to redefine __str_ to catch printing, albeit 
differently here (self.person would cause this _getattribute__to fail): 


# Code __getattribute__ differently to minimize extra calls 


class Manager: 
def _ init__(self, name, pay): 
self.person = Person(name, 'mgr', pay) 
def _ getattribute (self, attr): 
print('**', attr) 
person = object.__getattribute (self, 'person') 
if attr == 'giveRaise': 
return lambda percent: person.giveRaise(percent+.10) 
else: 
return getattr(person, attr) 
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def _str_ (self): 
person = object. _getattribute__(self, 'person') 
return str(person) 


When this alternative runs, our object prints properly, but only because we’ve added 
an explicit _str__ in the wrapper—this attribute is still not routed to our generic at- 
tribute interception method: 

Jones 

[Person: Sue Jones, 110000] 

** lastName 

Jones 

** giveRaise 

[Person: Tom Jones, 60000] 


That short story here is that delegation-based classes like Manager must redefine some 
operator overloading methods (like _ str__) to route them to embedded objects in 
Python 3.0, but not in Python 2.6 unless new-style classes are used. Our only direct 
options seem to be using __ getattr__ and Python 2.6, or redefining operator overload- 
ing methods in wrapper classes redundantly in 3.0. 


Again, this isn’t an impossible task; many wrappers can predict the set of operator 
overloading methods required, and tools and superclasses can automate part of this 
task. Moreover, not all classes use operator overloading methods (indeed, most appli- 
cation classes usually should not). It is, however, something to keep in mind for dele- 
gation coding models used in Python 3.0; when operator overloading methods are part 
of an object’s interface, wrappers must accommodate them portably by redefining them 
locally. 


Example: Attribute Validations 


To close out this chapter, let’s turn to a more realistic example, coded in all four of our 
attribute management schemes. The example we will use defines a CardHolder object 
with four attributes, three of which are managed. The managed attributes validate or 
transform values when fetched or stored. All four versions produce the same results for 
the same test code, but they implement their attributes in very different ways. The 
examples are included largely for self-study; although I won’t go through their code in 
detail, they all use concepts we’ve already explored in this chapter. 


Using Properties to Validate 


Our first coding uses properties to manage three attributes. As usual, we could use 
simple methods instead of managed attributes, but properties help if we have been 
using attributes in existing code already. Properties run code automatically on attribute 
access, but are focused on a specific set of attributes; they cannot be used to intercept 
all attributes generically. 
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To understand this code, it’s crucial to notice that the attribute assignments inside the 
__init__ constructor method trigger property setter methods too. When this method 
assigns to self.name, for example, it automatically invokes the setName method, which 
transforms the value and assigns it to an instance attribute called __name so it won’t 
clash with the property’s name. 


This renaming (sometimes called name mangling) is necessary because properties use 
common instance state and have none of their own. Data is stored in an attribute called 
__name, and the attribute called name is always a property, not data. 


In the end, this class manages attributes called name, age, and acct; allows the attribute 
addr to be accessed directly; and provides a read-only attribute called remain that is 
entirely virtual and computed on demand. For comparison purposes, this property- 
based coding weighs in at 39 lines of code: 

class CardHolder: 


acctlen = 8 # Class data 
retireage = 59.5 


def _init_ (self, acct, name, age, addr): 


self.acct = acct # Instance data 

self.name = name # These trigger prop setters too 
self.age = age # __X mangled to have class name 
self.addr = addr # addr is not managed 


# remain has no data 
def getName(self): 
return self. name 
def setName(self, value): 
value = value.lower().replace(' ', '_') 
self.__name = value 
name = property(getName, setName) 


def getAge(self): 
return self. age 
def setAge(self, value): 
if value < 0 or value > 150: 
raise ValueError('invalid age') 
else: 
self. age = value 
age = property(getAge, setAge) 


def getAcct(self): 
return self. acct[:-3] + '***' 
def setAcct(self, value): 
value = value.replace('-', '') 
if len(value) != self.acctlen: 
raise TypeError('invald acct number' ) 
else: 
self. __acct = value 
acct = property(getAcct, setAcct) 


def remainGet(self): # Could be a method, not attr 
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return self.retireage - self.age # Unless already using as attr 
remain = property(remainGet) 


Self-test code 


The following code tests our class; add this to the bottom of your file, or place the class 
in a module and import it first. We’ll use this same testing code for all four versions of 
this example. When it runs, we make two instances of our managed-attribute class and 
fetch and change its various attributes. Operations expected to fail are wrapped in 
try statements: 


bob = CardHolder('1234-5678', 'Bob Smith', 40, '123 main st') 
print(bob.acct, bob.name, bob.age, bob.remain, bob.addr, sep=' / ') 
bob.name = ‘Bob Q. Smith' 

bob.age = 50 

bob.acct = '23-45-67-89' 

print(bob.acct, bob.name, bob.age, bob.remain, bob.addr, sep=' / ') 


sue = CardHolder('5678-12-34', 'Sue Jones', 35, '124 main st') 
print(sue.acct, sue.name, sue.age, sue.remain, sue.addr, sep=' / ') 
try: 

sue.age = 200 
except: 

print('Bad age for Sue') 


try: 
sue.remain = 5 
except: 
print("Can't set sue.remain") 


try: 

sue.acct = '1234567' 
except: 

print('Bad acct for Sue') 


Here is the output of our self-test code; again, this is the same for all versions of this 
example. Trace through this code to see how the class’s methods are invoked; accounts 
are displayed with some digits hidden, names are converted to a standard format, and 
time remaining until retirement is computed when fetched using a class attribute cutoff: 

12345*** / bob_smith / 40 / 19.5 / 123 main st 

23456*** / bob_q._smith / 50 / 9.5 / 123 main st 

56781*** / sue jones / 35 / 24.5 / 124 main st 

Bad age for Sue 


Can't set sue.remain 
Bad acct for Sue 


Using Descriptors to Validate 


Now, let’s recode our example using descriptors instead of properties. As we’ve seen, 
descriptors are very similar to properties in terms of functionality and roles; in fact, 
properties are basically a restricted form of descriptor. Like properties, descriptors are 
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designed to handle specific attributes, not generic attribute access. Unlike properties, 
descriptors have their own state, and they’re a more general scheme. 


To understand this code, it’s again important to notice that the attribute assignments 
inside the _init__ constructor method trigger descriptor ___set__ methods. When the 
constructor method assigns to self.name, for example, it automatically invokes the 
Name. __set__() method, which transforms the value and assigns it to a descriptor at- 
tribute called name. 


Unlike in the prior property-based variant, though, in this case the actual name value is 
attached to the descriptor object, not the client class instance. Although we could store 
this value in either instance or descriptor state, the latter avoids the need to mangle 
names with underscores to avoid collisions. In the CardHolder client class, the attribute 
called name is always a descriptor object, not data. 


In the end, this class implements the same attributes as the prior version: it manages 
attributes called name, age, and acct; allows the attribute addr to be accessed directly; 
and provides a read-only attribute called remain that is entirely virtual and computed 
on demand. Notice how we must catch assignments to the remain name in its descriptor 
and raise an exception; as we learned earlier, if we did not do this, assigning to this 
attribute of an instance would silently create an instance attribute that hides the class 
attribute descriptor. For comparison purposes, this descriptor-based coding takes 45 
lines of code: 
class CardHolder: 


acctlen = 8 # Class data 
retireage = 59.5 


def _ init__(self, acct, name, age, addr): 


self.acct = acct # Instance data 
self.name = name # These trigger __set__ calls too 
self.age = age # __X not needed: in descriptor 
self.addr = addr # addr is not managed 
# remain has no data 
class Name: 
def _ get_ (self, instance, owner): # Class names: CardHolder locals 


return self.name 
def _set_ (self, instance, value): 
value = value.lower().replace(' ', '_') 
self.name = value 
name = Name() 


class Age: 
def _ get (self, instance, owner): 
return self.age # Use descriptor data 
def _set_ (self, instance, value): 
if value < 0 or value > 150: 
raise ValueError('invalid age') 
else: 
self.age = value 


age = Age() 
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class Acct: 
def _ get (self, instance, owner): 
return self.acct[:-3] + '***' 
def _set_ (self, instance, value): 
value = value.replace('-', '') 
if len(value) != instance.acctlen: # Use instance class data 
raise TypeError('invald acct number' ) 
else: 
self.acct = value 
acct = Acct() 


class Remain: 
def _ get (self, instance, owner): 
return instance.retireage - instance.age # Triggers Age.__get__ 
def _set_ (self, instance, value): 
raise TypeError('cannot set remain’) # Else set allowed here 
remain = Remain() 


Using __getattr___ to Validate 


As we’ve seen, the _ getattr__ method intercepts all undefined attributes, so it can be 
more generic than using properties or descriptors. For our example, we simply test the 
attribute name to know when a managed attribute is being fetched; others are stored 
physically on the instance and so never reach _ getattr__. Although this approach is 
more general than using properties or descriptors, extra work may be required to imitate 
the specific-attribute focus of other tools. We need to check names at runtime, and we 
must code a ___setattr__ in order to intercept and validate attribute assignments. 


As for the property and descriptor versions of this example, it’s critical to notice that 
the attribute assignments inside the __init__ constructor method trigger the class’s 
__setattr__ method too. When this method assigns to self.name, for example, it au- 
tomatically invokes the __setattr__ method, which transforms the value and assigns 
it to an instance attribute called name. By storing name on the instance, it ensures that 
future accesses will not trigger _getattr__. In contrast, acct is stored as _acct, so that 
later accesses to acct do invoke _ getattr_. 


In the end, this class, like the prior two, manages attributes called name, age, and 
acct; allows the attribute addr to be accessed directly; and provides a read-only attribute 
called remain that is entirely virtual and is computed on demand. 


For comparison purposes, this alternative comes in at 32 lines of code—7 fewer than 
the property-based version, and 13 fewer than the version using descriptors. Clarity 
matters more than code size, of course, but extra code can sometimes imply extra 
development and maintenance work. Probably more important here are roles: generic 
tools like __ getattr__ may be better suited to generic delegation, while properties and 
descriptors are more directly designed to manage specific attributes. 


Also note that the code here incurs extra calls when setting unmanaged attributes (e.g., 
addr), although no extra calls are incurred for fetching unmanaged attributes, since they 
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are defined. Though this will likely result in negligible overhead for most programs, 
properties and descriptors incur an extra call only when managed attributes are 
accessed. 


Here’s the _ getattr__ version of our code: 


class CardHolder: 
acctlen = 8 # Class data 
retireage = 59.5 


def init__(self, acct, name, age, addr): 


self.acct = acct # Instance data 

self.name = name # These trigger __setattr__ too 
self.age = age # _acct not mangled: name tested 
self.addr = addr # addr is not managed 


# remain has no data 
def _getattr_ (self, name): 


if name == ‘acct’: # On undefined attr fetches 
return self. acct[:-3] + '***' # name, age, addr are defined 
elif name == 'remain': 
return self.retireage - self.age # Doesn't trigger __getattr__ 
else: 


raise AttributeError(name) 


def _setattr_(self, name, value): 


if name == ‘name’: # On all attr assignments 
value = value.lower().replace(' ', '_') # addr stored directly 
elif name == 'age': # acct mangled to _acct 


if value < 0 or value > 150: 
raise ValueError('invalid age') 
elif name == ‘acct’: 


name = ' acct’ 
value = value.replace('-', '') 
if len(value) != self.acctlen: 
raise TypeError('invald acct number' ) 
elif name == 'remain': 
raise TypeError('cannot set remain’) 
self. _dict__[name] = value # Avoid looping 


Using __getattribute__ to Validate 


Our final variant uses the _ getattribute__ catchall to intercept attribute fetches and 
manage them as needed. Every attribute fetch is caught here, so we test the attribute 
names to detect managed attributes and route all others to the superclass for normal 
fetch processing. This version uses the same _ setattr__ to catch assignments as the 
prior version. 


The code works very much like the _ getattr__ version, so I won’t repeat the full 
description here. Note, though, that because every attribute fetch is routed to 
__getattribute_, we don’tneed to mangle names to intercept them here (acct is stored 
as acct). On the other hand, this code must take care to route nonmanaged attribute 
fetches to a superclass to avoid looping. 
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Also notice that this version incurs extra calls for both setting and fetching unmanaged 
attributes (e.g., addr); if speed is paramount, this alternative may be the slowest of the 
bunch. For comparison purposes, this version amounts to 32 lines of code, just like the 
prior version: 

class CardHolder: 


acctlen = 8 # Class data 
retireage = 59.5 


def _ init__(self, acct, name, age, addr): 


self.acct = acct # Instance data 

self.name = name # These trigger __setattr__ too 
self.age = age # acct not mangled: name tested 
self.addr = addr # addr is not managed 


# remain has no data 
def _ getattribute_ (self, name): 


superget = object. getattribute__ # Don't loop: one level up 
if name == ‘acct’: # On all attr fetches 
return superget(self, 'acct')[:-3] + '***' 
elif name == 'remain': 
return superget(self, 'retireage') - superget(self, ‘age') 
else: 
return superget(self, name) # name, age, addr: stored 


def _setattr_(self, name, value): 


if name == ‘name’: # On all attr assignments 
value = value.lower().replace(' ', '_') # addr stored directly 
elif name == ‘age’: 


if value < 0 or value > 150: 
raise ValueError('invalid age') 
elif name == ‘acct’: 
value = value.replace('-', '') 
if len(value) != self.acctlen: 
raise TypeError('invald acct number' ) 
elif name == 'remain': 
raise TypeError('cannot set remain’) 
self. _dict__[name] = value # Avoid loops, orig names 


Be sure to study and run this section’s code on your own for more pointers on managed 
attribute coding techniques. 


Chapter Summary 


This chapter covered the various techniques for managing access to attributes in Py- 
thon, including the _ getattr__ and __getattribute__ operator overloading methods, 
class properties, and attribute descriptors. Along the way, it compared and contrasted 
these tools and presented a handful of use cases to demonstrate their behavior. 


Chapter 38 continues our tool-building survey with a look at decorators—code run 
automatically at function and class creation time, rather than on attribute access. Before 
we continue, though, let’s work througha set of questions to review what we’ve covered 
here. 
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Test Your Knowledge: Quiz 


BRWN 


. How do _ getattr__and _ getattribute__ differ? 
. How do properties and descriptors differ? 
. How are properties and decorators related? 


. What are the main functional differences between _ getattr__ and _ getattri 


bute__ and properties and descriptors? 


. Isn’t all this feature comparison just a kind of argument? 


Test Your Knowledge: Answers 


1. 


The _ getattr_ method is run for fetches of undefined attributes only—i.e., those 
not present on an instance and not inherited from any of its classes. By contrast, 
the _ getattribute_ method is called for every attribute fetch, whether the at- 
tribute is defined or not. Because of this, code inside a ___getattr__ can freely fetch 
other attributes if they are defined, whereas __getattribute must use special code 
for all such attribute fetches to avoid looping (it must route fetches to a superclass 
to skip itself). 


. Properties serve a specific role, while descriptors are more general. Properties define 


get, set, and delete functions for a specific attribute; descriptors provide a class 
with methods for these actions, too, but they provide extra flexibility to support 
more arbitrary actions. In fact, properties are really a simple way to create a specific 
kind of descriptor—one that runs functions on attribute accesses. Coding differs 
too: a property is created with a built-in function, and a descriptor is coded with 
a class; as such, descriptors can leverage all the usual OOP features of classes, such 
as inheritance. Moreover, in addition to the instance’s state information, descrip- 
tors have local state of their own, so they can avoid name collisions in the instance. 


. Properties can be coded with decorator syntax. Because the property built-in ac- 


cepts a single function argument, it can be used directly as a function decorator to 
define a fetch access property. Due to the name rebinding behavior of decorators, 
the name of the decorated function is assigned to a property whose get accessor is 
set to the original function decorated (name = property(name)). Property setter 
and deleter attributes allow us to further add set and delete accessors with deco- 
ration syntax—they set the accessor to the decorated function and return the aug- 
mented property. 
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4. The__getattr__and__getattribute__ methods are more generic: they can be used 
to catch arbitrarily many attributes. In contrast, each property or descriptor pro- 
vides access interception for only one specific attribute—we can’t catch every at- 
tribute fetch with a single property or descriptor. On the other hand, properties 
and descriptors handle both attribute fetch and assignment by design: 
__getattr__ and _ getattribute_ handle fetches only; to intercept assignments 
as well, _setattr__ must also be coded. The implementation is also different: 
__getattr_ and _ getattribute_ are operator overloading methods, whereas 
properties and descriptors are objects manually assigned to class attributes. 


5. No it isn’t. To quote from Python namesake Monty Python’s Flying Circus: 


An argument is a connected series of statements intended to establish a 
proposition. 

No it isn't. 

Yes it is! It's not just contradiction. 

Look, if I argue with you, I must take up a contrary position. 

Yes, but that's not just saying "No it isn't." 

Yes it is! 

No it isn't! 

Yes it is! 

No it isn't. Argument is an intellectual process. Contradiction is just 
the automatic gainsaying of any statement the other person makes. 
(short pause) 

No it isn't. 

It is. 

Not at all. 

Now look... 
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CHAPTER 38 
Decorators 


In the advanced class topics chapter of this book (Chapter 31), we met static and class 
methods and took a quick look at the @ decorator syntax Python offers for declaring 
them. We also met function decorators briefly in the prior chapter (Chapter 37), while 
exploring the property built-in’s ability to serve as a decorator, and in Chapter 28 while 
studying the notion of abstract superclasses. 


This chapter picks up where the previous decorator coverage left off. Here, we’ll dig 
deeper into the inner workings of decorators and study more advanced ways to code 
new decorators ourselves. As we’ll see, many of the concepts we studied in earlier 
chapters, such as state retention, show up regularly in decorators. 


This is a somewhat advanced topic, and decorator construction tends to be of more 
interest to tool builders than to application programmers. Still, given that decorators 
are becoming increasingly common in popular Python frameworks, a basic under- 
standing can help demystify their role, even if you’re just a decorator user. 


Besides covering decorator construction details, this chapter serves as a more realistic 
case study of Python in action. Because its examples are somewhat larger than most of 
the others we’ve seen in this book, they better illustrate how code comes together into 
more complete systems and tools. As an extra perk, much of the code we’ll write here 
may be used as general-purpose tools in your day-to-day programs. 


What's a Decorator? 


Decoration is a way to specify management code for functions and classes. Decorators 
themselves take the form of callable objects (e.g., functions) that process other callable 
objects. As we saw earlier in this book, Python decorators come in two related flavors: 


e Function decorators do name rebinding at function definition time, providing a 
layer of logic that can manage functions and methods, or later calls to them. 


e Class decorators do name rebinding at class definition time, providing a layer of 
logic that can manage classes, or the instances created by calling them later. 
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In short, decorators provide a way to insert automatically run code at the end of function 
and class definition statements—at the end of a def for function decorators, and at the 
end of a class for class decorators. Such code can play a variety of roles, as described 
in the following sections. 


Managing Calls and Instances 


For example, in typical use, this automatically run code may be used to augment calls 
to functions and classes. It arranges this by installing wrapper objects to be invoked later: 


e Function decorators install wrapper objects to intercept later function calls and 
process them as needed. 


e Class decorators install wrapper objects to intercept later instance creation calls 
and process them as required. 


Decorators achieve these effects by automatically rebinding function and class names 
to other callables, at the end of def and class statements. When later invoked, these 
callables can perform tasks such as tracing and timing function calls, managing access 
to class instance attributes, and so on. 


Managing Functions and Classes 


Although most examples in this chapter deal with using wrappers to intercept later 
calls to functions and classes, this is not the only way decorators can be used: 


e Function decorators can also be used to manage function objects, instead of later 
calls to them—to register a function to an API, for instance. Our primary focus 
here, though, will be on their more commonly used call wrapper application. 


e Class decorators can also be used to manage class objects directly, instead of in- 
stance creation calls—to augment a class with new methods, for example. Because 
this role intersects strongly with that of metaclasses (indeed, both run at the end 
of the class creation process), we’ll see additional use cases in the next chapter. 


In other words, function decorators can be used to manage both function calls and 
function objects, and class decorators can be used to manage both class instances and 
classes themselves. By returning the decorated object itself instead of a wrapper, dec- 
orators become a simple post-creation step for functions and classes. 


Regardless of the role they play, decorators provide a convenient and explicit way to 
code tools useful both during program development and in live production systems. 


Using and Defining Decorators 


Depending on your job description, you might encounter decorators as a user or a 
provider. As we’ve seen, Python itself comes with built-in decorators that have spe- 
cialized roles—static method declaration, property creation, and more. In addition, 
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many popular Python toolkits include decorators to perform tasks such as managing 
database or user-interface logic. In such cases, we can get by without knowing how the 
decorators are coded. 


For more general tasks, programmers can code arbitrary decorators of their own. For 
example, function decorators may be used to augment functions with code that adds 
call tracing, performs argument validity testing during debugging, automatically ac- 
quires and releases thread locks, times calls made to function for optimization, and so 
on. Any behavior you can imagine adding to a function call is a candidate for custom 
function decorators. 


On the other hand, function decorators are designed to augment only a specific function 
or method call, not an entire object interface. Class decorators fill the latter role better— 
because they can intercept instance creation calls, they can be used to implement ar- 
bitrary object interface augmentation or management tasks. For example, custom class 
decorators can trace or validate every attribute reference made for an object. They can 
also be used to implement proxy objects, singleton classes, and other common coding 
patterns. In fact, we’ll find that many class decorators bear a strong resemblance to the 
delegation coding pattern we met in Chapter 30. 


Why Decorators? 


Like many advanced Python tools, decorators are never strictly required from a purely 
technical perspective: their functionality can often be implemented instead using sim- 
ple helper function calls or other techniques (and at a base level, we can always manually 
code the name rebinding that decorators perform automatically). 


That said, decorators provide an explicit syntax for such tasks, which makes intent 
clearer, can minimize augmentation code redundancy, and may help ensure correct 
API usage: 


e Decorators have a very explicit syntax, which makes them easier to spot than helper 
function calls that may be arbitrarily far-removed from the subject functions or 
classes. 


e Decorators are applied once, when the subject function or class is defined; it’s not 
necessary to add extra code (which may have to be changed in the future) at every 
call to the class or function. 


e Because of both of the prior points, decorators make it less likely that a user of an 
API will forget to augment a function or class according to API requirements. 


In other words, beyond their technical model, decorators offer some advantages in 
terms of code maintenance and aesthetics. Moreover, as structuring tools, decorators 
naturally foster encapsulation of code, which reduces redundancy and makes future 
changes easier. 
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Decorators do have some potential drawbacks, too—when they insert wrapper logic, 
they can alter the types of the decorated objects, and they may incur extra calls. On the 
other hand, the same considerations apply to any technique that adds wrapping logic 
to objects. 


We'll explore these tradeoffs in the context of real code later in this chapter. Although 
the choice to use decorators is still somewhat subjective, their advantages are compel- 
ling enough that they are quickly becoming best practice in the Python world. To help 
you decide for yourself, let’s turn to the details. 


The Basics 


Let’s get started with a first-pass look at decoration behavior from a symbolic perspec- 
tive. We’ll write real code soon, but since most of the magic of decorators boils down 
to an automatic rebinding operation, it’s important to understand this mapping first. 


Function Decorators 


Function decorators have been available in Python since version 2.5. As we saw earlier 
in this book, they are largely just syntactic sugar that runs one function through another 
at the end of a def statement, and rebinds the original function name to the result. 


Usage 


A function decorator is a kind of runtime declaration about the function whose defini- 
tion follows. The decorator is coded on a line just before the def statement that defines 
a function or method, and it consists of the @ symbol followed by a reference to a 
metafunction—a function (or other callable object) that manages another function. 


In terms of code, function decorators automatically map the following syntax: 


@decorator # Decorate function 
def F(arg): 
F(99) # Call function 


into this equivalent form, where decorator is a one-argument callable object that re- 
turns a callable object with the same number of arguments as F: 


def F(arg): 
F = decorator(F) # Rebind function name to decorator result 
F(99) # Essentially calls decorator (F)(99) 
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This automatic name rebinding works on any def statement, whether it’s for a simple 
function or a method within a class. When the function F is later called, it’s actually 
calling the object returned by the decorator, which may be either another object that 
implements required wrapping logic, or the original function itself. 
In other words, decoration essentially maps the first of the following into the second 
(though the decorator is really run only once, at decoration time): 

func(6, 7) 

decorator(func)(6, 7) 


This automatic name rebinding accounts for the static method and property decoration 
syntax we met earlier in the book: 


class C: 

@staticmethod 

def meth(...): ... # meth = staticmethod(meth) 
class C: 

@property 

def name(self): ... # name = property(name) 


In both cases, the method name is rebound to the result of a built-in function decorator, 
at the end of the def statement. Calling the original name later invokes whatever object 
the decorator returns. 


Implementation 


A decorator itself is a callable that returns a callable. That is, it returns the object to be 
called later when the decorated function is invoked through its original name—either 
a wrapper object to intercept later calls, or the original function augmented in some 
way. In fact, decorators can be any type of callable and return any type of callable: any 
combination of functions and classes may be used, though some are better suited to 
certain contexts. 


For example, to tap into the decoration protocol in order to manage a function just 
after it is created, we might code a decorator of this form: 
def decorator(F): 


# Process function F 
return F 


@decorator 
def func(): ... # func = decorator(func) 


Because the original decorated function is assigned back to its name, this simply adds 
a post-creation step to function definition. Such a structure might be used to register a 
function to an API, assign function attributes, and so on. 
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In more typical use, to insert logic that intercepts later calls to a function, we might 
code a decorator to return a different object than the original function: 
def decorator(F): 


# Save or use function F 
# Return a different callable: nested def, class with __call__, etc. 


@decorator 
def func(): ... # func = decorator(func) 


This decorator is invoked at decoration time, and the callable it returns is invoked when 
the original function name is later called. The decorator itself receives the decorated 
function; the callable returned receives whatever arguments are later passed to the 
decorated function’s name. This works the same for class methods: the implied instance 
object simply shows up in the first argument of the returned callable. 


In skeleton terms, here’s one common coding pattern that captures this idea—the dec- 
orator returns a wrapper that retains the original function in an enclosing scope: 


def decorator(F): # On @ decoration 
def wrapper (*args): # On wrapped function call 
# Use F and args 
# F(‘args) calls original function 
return wrapper 


@decorator # func = decorator (func) 
def func(x, y): # func is passed to decorator's F 
func(6, 7) # 6, 7 are passed to wrapper's “args 


When the name func is later called, it really invokes the wrapper function returned by 
decorator; the wrapper function can then run the original func because it is still available 
in an enclosing scope. When coded this way, each decorated function produces a new 
scope to retain state. 


To do the same with classes, we can overload the call operation and use instance at- 
tributes instead of enclosing scopes: 


class decorator: 


def init__(self, func): # On @ decoration 
self.func = func 
def _call_ (self, *args): # On wrapped function call 


# Use self.func and args 
# self.func(‘args) calls original function 


@decorator 
def func(x, y): # func = decorator (func) 

sees # func is passed to _init__ 
func(6, 7) # 6, 7 are passed to __call__'s “args 


When the name func is later called now, it really invokes the __call__ operator over- 
loading method of the instance created by decorator; the _call__ method can then 
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run the original func because it is still available in an instance attribute. When coded 
this way, each decorated function produces a new instance to retain state. 


Supporting method decoration 


One subtle point about the prior class-based coding is that while it works to intercept 
simple function calls, it does not quite work when applied to class method functions: 


class decorator: 


def _ init__(self, func): # func is method without instance 
self.func = func 
def _call_ (self, *args): # self is decorator instance 


# self.func(‘args) fails! # C instance not in args! 


class C: 
@decorator 
def method(self, x, y): # method = decorator(method) 
a # Rebound to decorator instance 


When coded this way, the decorated method is rebound to an instance of the decorator 
class, instead of a simple function. 


The problem with this is that the self in the decorator’s _call__ receives the 
decorator class instance when the method is later run, and the instance of class C is 
never included in *args. This makes it impossible to dispatch the call to the original 
method—the decorator object retains the original method function, but it has no in- 
stance to pass to it. 


To support both functions and methods, the nested function alternative works better: 


def decorator(F): # F is func or method without instance 
def wrapper (*args): # class instance in args[0] for method 
# F(‘args) runs func or method 
return wrapper 


@decorator 
def func(x, y): # func = decorator(func) 
func(6, 7) # Really calls wrapper (6, 7) 
class C: 

@decorator 

def method(self, x, y): # method = decorator(method) 

ns # Rebound to simple function 

X = C() 
X.method(6, 7) # Really calls wrapper(X, 6, 7) 


When coded this way wrapper receives the C class instance in its first argument, so it 
can dispatch to the original method and access state information. 


Technically, this nested-function version works because Python creates a bound 
method object and thus passes the subject class instance to the self argument only 
when a method attribute references a simple function; when it references an instance 
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of a callable class instead, the callable class’s instance is passed to self to give the 
callable class access to its own state information. We’ll see how this subtle difference 
can matter in more realistic examples later in this chapter. 


Also note that nested functions are perhaps the most straightforward way to support 
decoration of both functions and methods, but not necessarily the only way. The prior 
chapter’s descriptors, for example, receive both the descriptor and subject class instance 
when called. Though more complex, later in this chapter we’ll see how this tool can be 
leveraged in this context as well. 


Class Decorators 


Function decorators proved so useful that the model was extended to allow class dec- 
oration in Python 2.6 and 3.0. Class decorators are strongly related to function deco- 
rators; in fact, they use the same syntax and very similar coding patterns. Rather than 
wrapping individual functions or methods, though, class decorators are a way to man- 
age classes, or wrap up instance construction calls with extra logic that manages or 
augments instances created from a class. 


Usage 


Syntactically, class decorators appear just before class statements (just as function 
decorators appear just before function definitions). In symbolic terms, assuming that 
decorator is a one-argument function that returns a callable, the class decorator syntax: 


@decorator # Decorate class 
class C: 
x = C(99) # Make an instance 


is equivalent to the following—the class is automatically passed to the decorator func- 
tion, and the decorator’s result is assigned back to the class name: 


class C: 
C = decorator (C) # Rebind class name to decorator result 
x = €(99) # Essentially calls decorator(C) (99) 


The net effect is that calling the class name later to create an instance winds up triggering 
the callable returned by the decorator, instead of calling the original class itself. 


Implementation 


New class decorators are coded using many of the same techniques used for function 
decorators. Because a class decorator is also a callable that returns a callable, most 
combinations of functions and classes suffice. 
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However it’s coded, the decorator’s result is what runs when an instance is later created. 
For example, to simply manage a class just after it is created, return the original class 
itself: 

def decorator(C): 


# Process class C 
return C 


@decorator 
class C: ... # C = decorator(C) 


To instead insert a wrapper layer that intercepts later instance creation calls, return a 
different callable object: 
def decorator(C): 


# Save or use class C 
# Return a different callable: nested def, class with __call__, etc. 


@decorator 
class C: ... # C = decorator(C) 


The callable returned by such a class decorator typically creates and returns a new 
instance of the original class, augmented in some way to manage its interface. For 
example, the following inserts an object that intercepts undefined attributes of a class 
instance: 


def decorator(cls): # On @ decoration 
class Wrapper: 
def init__(self, *args): # On instance creation 
self.wrapped = cls(*args) 
def _ getattr__(self, name): # On attribute fetch 


return getattr(self.wrapped, name) 
return Wrapper 


@decorator 
class C: # C = decorator(C) 
def init__(self, x, y): # Run by Wrapper.__init__ 
self.attr = 'spam' 
x = C(6, 7) # Really calls Wrapper(6, 7) 
print(x.attr) # Runs Wrapper.__getattr__, prints "spam" 


In this example, the decorator rebinds the class name to another class, which retains 
the original class in an enclosing scope and creates and embeds an instance of the 
original class when it’s called. When an attribute is later fetched from the instance, it 
is intercepted by the wrapper’s __getattr__ and delegated to the embedded instance 
of the original class. Moreover, each decorated class creates a new scope, which re- 
members the original class. We’ll flesh out this example into some more useful code 
later in this chapter. 
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Like function decorators, class decorators are commonly coded as either “factory” 
functions that create and return callables, classes that use __init__or__call__ methods 
to intercept call operations, or some combination thereof. Factory functions typically 
retain state in enclosing scope references, and classes in attributes. 


Supporting multiple instances 


As with function decorators, with class decorators some callable type combinations 
work better than others. Consider the following invalid alternative to the class deco- 
rator of the prior example: 


class Decorator: 


def _ init__(self, C): # On @ decoration 
self.C =C 
def _call_ (self, *args): # On instance creation 


self.wrapped = self.C(*args) 
return self 

def _getattr__(self, attrname): # On atrribute fetch 
return getattr(self.wrapped, attrname) 


@Decorator 

class C: ... # C = Decorator(C) 
x = C() 

y = C() # Overwrites x! 


This code handles multiple decorated classes (each makes a new Decorator instance) 
and will intercept instance creation calls (each runs __call _). Unlike the prior version, 
however, this version fails to handle multiple instances of a given class—each instance 
creation call overwrites the prior saved instance. The original version does support 
multiple instances, because each instance creation call makes a new independent wrap- 
per object. More generally, either of the following patterns supports multiple wrapped 
instances: 


def decorator(C): # On @ decoration 
class Wrapper: 
def init__(self, *args): # On instance creation 


self.wrapped = C(*args) 
return Wrapper 


class Wrapper: ... 


def decorator(C): # On @ decoration 
def onCall(*args): # On instance creation 
return Wrapper (C(*args)) # Embed instance in instance 


return onCall 


We'll study this phenomenon in a more realistic context later in the chapter; in practice, 
though, we must be careful to combine callable types properly to support our intent. 
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Decorator Nesting 


Sometimes one decorator isn’t enough. To support multiple steps of augmentation, 
decorator syntax allows you to add multiple layers of wrapper logic to a decorated 
function or method. When this feature is used, each decorator must appear on a line 
of its own. Decorator syntax of this form: 

@A 

@B 


@c 
def f(...): 


runs the same as the following: 
def f(...): 
f = A(B(C(F))) 
Here, the original function is passed through three different decorators, and the re- 


sulting callable object is assigned back to the original name. Each decorator processes 
the result of the prior, which may be the original function or an inserted wrapper. 


If all the decorators insert wrappers, the net effect is that when the original function 
name is called, three different layers of wrapping object logic will be invoked, to aug- 
ment the original function in three different ways. The last decorator listed is the first 
applied, and the most deeply nested (insert joke about “interior decorators” here...). 


Just as for functions, multiple class decorators result in multiple nested function calls, 
and possibly multiple levels of wrapper logic around instance creation calls. For ex- 
ample, the following code: 


@spam 
@eggs 
class C: 


X = C() 
is equivalent to the following: 
class C: 
c = spam(eggs(C)) 
X = C() 


Again, each decorator is free to return either the original class or an inserted wrapper 
object. With wrappers, when an instance of the original € class is finally requested, the 
call is redirected to the wrapping layer objects provided by both the spam and eggs 
decorators, which may have arbitrarily different roles. 
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For example, the following do-nothing decorators simply return the decorated 
function: 
def d1(F): return F 


def d2(F): return F 
def d3(F): return F 


@d1 

@d2 

@d3 

def func(): # func = d1(d2(d3(func))) 
print('spam') 


func() # Prints "spam" 
The same syntax works on classes, as do these same do-nothing decorators. 


When decorators insert wrapper function objects, though, they may augment the orig- 
inal function when called—the following concatenates to its result in the decorator 
layers, as it runs the layers from inner to outer: 

def d1(F): return lambda: 'X' + F() 


def d2(F): return lambda: 'Y' + F() 
def d3(F): return lambda: 'Z' + F() 


@d1 

@d2 

@d3 

def func(): # func = d1(d2(d3(func))) 
return 'spam' 


print(func()) # Prints "XYZspam" 


We use lambda functions to implement wrapper layers here (each retains the wrapped 
function in an enclosing scope); in practice, wrappers can take the form of functions, 
callable classes, and more. When designed well, decorator nesting allows us to combine 
augmentation steps in a wide variety of ways. 


Decorator Arguments 


Both function and class decorators can also seem to take arguments, although really 
these arguments are passed to a callable that in effect returns the decorator, which in 
turn returns a callable. The following, for instance: 


@decorator(A, B) 
def F(arg): 


F(99) 


is automatically mapped into this equivalent form, where decorator is a callable that 
returns the actual decorator. The returned decorator in turn returns the callable run 
later for calls to the original function name: 
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def F(arg): 
F = decorator(A, B)(F) # Rebind F to result of decorator's return value 


F(99) # Essentially calls decorator(A, B)(F)(99) 


Decorator arguments are resolved before decoration ever occurs, and they are usually 
used to retain state information for use in later calls. The decorator function in this 
example, for instance, might take a form like the following: 


def decorator(A, B): 
# Save or use A, B 
def actualDecorator(F): 
# Save or use function F 
# Return a callable: nested def, class with __call 
return callable 
return actualDecorator 


etc. 


— 


The outer function in this structure generally saves the decorator arguments away as 
state information, for use in the actual decorator, the callable it returns, or both. This 
code snippet retains the state information argument in enclosing function scope refer- 
ences, but class attributes are commonly used as well. 


In other words, decorator arguments often imply three levels of callables: a callable to 
accept decorator arguments, which returns a callable to serve as decorator, which re- 
turns a callable to handle calls to the original function or class. Each of the three levels 
may be a function or class and may retain state in the form of scopes or class attributes. 
We'll see concrete examples of decorator arguments employed later in this chapter. 


Decorators Manage Functions and Classes, Too 


Although much of the rest of this chapter focuses on wrapping later calls to functions 
and classes, I should underscore that the decorator mechanism is more general than 
this—it is a protocol for passing functions and classes through a callable immediately 
after they are created. As such, it can also be used to invoke arbitrary post-creation 
processing: 


def decorate(0): 
# Save or augment function or class O 


return 0 
@decorator 
def F(): ... # F = decorator(F) 
@decorator 
class C: ... # C = decorator(C) 


As long as we return the original decorated object this way instead of a wrapper, we 
can manage functions and classes themselves, not just later calls to them. We’ll see 
more realistic examples later in this chapter that use this idea to register callable objects 
to an API with decoration and assign attributes to functions when they are created. 
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Coding Function Decorators 


On to the code—in the rest of this chapter, we are going to study working examples 
that demonstrate the decorator concepts we just explored. This section presents a 
handful of function decorators at work, and the next shows class decorators in action. 
Following that, we’ll close out with some larger case studies of class and function dec- 
orator usage. 


Tracing Calls 


To get started, let’s revive the call tracer example we met in Chapter 31. The following 
defines and applies a function decorator that counts the number of calls made to the 
decorated function and prints a trace message for each call: 


class tracer: 

def _ init__(self, func): # On @ decoration: save original func 
self.calls = 0 
self.func = func 

def _call_ (self, *args): # On later calls: run original func 
self.calls += 1 
print('call %s to %s' % (self.calls, self.func.name_)) 
self.func(*args) 


@tracer 
def spam(a, b, c): # spam = tracer(spam) 
print(a + b + c) # Wraps spam in a decorator object 


Notice how each function decorated with this class will create a new instance, with its 
own saved function object and calls counter. Also observe how the *args argument 
syntax is used to pack and unpack arbitrarily many passed-in arguments. This gener- 
ality enables this decorator to be used to wrap any function with any number of argu- 
ments (this version doesn’t yet work on class methods, but we'll fix that later in this 
section). 


Now, if we import this module’s function and test it interactively, we get the following 
sort of behavior—each call generates a trace message initially, because the decorator 
class intercepts it. This code runs under both Python 2.6 and 3.0, as does all code in 
this chapter unless otherwise noted: 


>>> from decorator1 import spam 


>>> spam(1, 2, 3) # Really calls the tracer wrapper object 
call 1 to spam 
6 


>>> spam('a', 'b', 'c') # Invokes __call__ in class 
call 2 to spam 
abc 


>>> spam.calls # Number calls in wrapper state information 
2 
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>>> spam 
<decorator1.tracer object at 0x02D9A730> 


When run, the tracer class saves away the decorated function, and intercepts later calls 
to it, in order to add a layer of logic that counts and prints each call. Notice how the 
total number of calls shows up as an attribute of the decorated function—spanm is really 
an instance of the tracer class when decorated (a finding that may have ramifications 
for programs that do type checking, but is generally benign). 


For function calls, the @ decoration syntax can be more convenient than modifying each 
call to account for the extra logic level, and it avoids accidentally calling the original 
function directly. Consider a nondecorator equivalent such as the following: 
calls = 0 
def tracer(func, *args): 
global calls 
calls += 1 


print('call %s to %s' % (calls, func. name_)) 
func(*args) 


def spam(a, b, c): 
print(a, b, c) 


>>> spam(1, 2, 3) # Normal non-traced call: accidental? 
123 


>>> tracer(spam, 1, 2, 3) # Special traced call without decorators 
call 1 to spam 
123 


This alternative can be used on any function without the special @ syntax, but unlike 
the decorator version, it requires extra syntax at every place where the function is called 
in your code; furthermore, its intent may not be as obvious, and it does not ensure that 
the extra layer will be invoked for normal calls. Although decorators are never re- 
quired (we can always rebind names manually), they are often the most convenient 
option. 


State Information Retention Options 


The last example of the prior section raises an important issue. Function decorators 
have a variety of options for retaining state information provided at decoration time, 
for use during the actual function call. They generally need to support multiple deco- 
rated objects and multiple calls, but there are a number of ways to implement these 
goals: instance attributes, global variables, nonlocal variables, and function attributes 
can all be used for retaining state. 
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Class instance attributes 


For example, here is an augmented version of the prior example, which adds support 
for keyword arguments and returns the wrapped function’s result to support more use 
cases: 


class tracer: # State via instance attributes 
def init__(self, func): # On @ decorator 
self.calls = 0 # Save func for later call 
self.func = func 
def _call_ (self, *args, **kwargs): # On call to original function 


self.calls += 1 
print('call %s to %s' % (self.calls, self.func. name_)) 
return self.func(*args, **kwargs) 


@tracer 
def spam(a, b, c): # Same as: spam = tracer(spam) 
print(a + b + c) # Triggers tracer.__init__ 
@tracer 
def eggs(x, y): # Same as: eggs = tracer(eggs) 
print(x ** y) # Wraps eggs in a tracer object 
spam(1, 2, 3) # Really calls tracer instance: runs tracer.__call__ 
spam(a=4, b=5, c=6) # spam is an instance attribute 
eggs(2, 16) # Really calls tracer instance, self.func is eggs 
eggs(4, y=4) # self.calls is per-function here (need 3.0 nonlocal) 


Like the original, this uses class instance attributes to save state explicitly. Both the 
wrapped function and the calls counter are per-instance information—each decoration 
gets its own copy. When runasa script under either 2.6 or 3.0, the output of this version 
is as follows; notice how the spam and eggs functions each have their own calls counter, 
because each decoration creates a new class instance: 

call 1 to spam 

6 

call 2 to spam 

15 

call 1 to eggs 

65536 

call 2 to eggs 

256 


While useful for decorating functions, this coding scheme has issues when applied to 
methods (more on this later). 


Enclosing scopes and globals 


Enclosing def scope references and nested defs can often achieve the same effect, es- 
pecially for static data like the decorated original function. In this example, though, we 
would also need a counter in the enclosing scope that changes on each call, and that’s 
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not possible in Python 2.6. In 2.6, we can either use classes and attributes, as we did 
earlier, or move the state variable out to the global scope, with global declarations: 


calls = 0 
def tracer(func): # State via enclosing scope and global 
def wrapper(*args, **kwargs): # Instead of class attributes 
global calls # calls is global, not per-function 
calls += 1 


print('call %s to %s' % (calls, func. _name_)) 
return func(*args, **kwargs) 
return wrapper 


@tracer 
def spam(a, b, c): # Same as: spam = tracer(spam) 
print(a + b + c) 


@tracer 

def eggs(x, y): # Same as: eggs = tracer(eggs) 
print(x ** y) 

spam(1, 2, 3) # Really calls wrapper, bound to func 

spam(a=4, b=5, c=6) # wrapper calls spam 

eggs(2, 16) # Really calls wrapper, bound to eggs 

eggs(4, y=4) # Global calls is not per-function here! 


Unfortunately, moving the counter out to the common global scope to allow it to be 
changed like this also means that it will be shared by every wrapped function. Unlike 
class instance attributes, global counters are cross-program, not per-function—the 
counter is incremented for any traced function call. You can tell the difference if you 
compare this version’s output with the prior version’s—the single, shared global call 
counter is incorrectly updated by calls to every decorated function: 

call 1 to spam 

6 

call 2 to spam 

15 

call 3 to eggs 

65536 


call 4 to eggs 
256 


Enclosing scopes and nonlocals 


Shared global state may be what we want in some cases. If we really want a 
per-function counter, though, we can either use classes as before, or make use of the 
new nonlocal statement in Python 3.0, described in Chapter 17. Because this new 
statement allows enclosing function scope variables to be changed, they can serve as 
per-decoration and changeable data: 


def tracer(func): # State via enclosing scope and nonlocal 
calls = 0 # Instead of class attrs or global 
def wrapper(*args, **kwargs): # calls is per-function, not global 
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nonlocal calls 
calls += 1 
print('call %s to %s' % (calls, func. _name_)) 
return func(*args, **kwargs) 
return wrapper 


@tracer 
def spam(a, b, c): # Same as: spam = tracer(spam) 
print(a + b + c) 


@tracer 
def eggs(x, y): # Same as: eggs = tracer(eggs) 

print(x ** y) 
spam(1, 2, 3) # Really calls wrapper, bound to func 
spam(a=4, b=5, c=6) # wrapper calls spam 
eggs(2, 16) # Really calls wrapper, bound to eggs 
eggs(4, y=4) # Nonlocal calls _is_ not per-function here 


Now, because enclosing scope variables are not cross-program globals, each wrapped 
function gets its own counter again, just as for classes and attributes. Here’s the new 
output when run under 3.0: 

call 1 to spam 

6 

call 2 to spam 

15 

call 1 to eggs 

65536 

call 2 to eggs 

256 


Function attributes 


Finally, if you are not using Python 3.X and don’t have a nonlocal statement, you may 
still be able to avoid globals and classes by making use of function attributes for some 
changeable state instead. In recent Pythons, we can assign arbitrary attributes to func- 
tions to attach them, with func.attr=value. In our example, we can simply use 
wrapper.calls for state. The following works the same as the preceding nonlocal ver- 
sion because the counter is again per-decorated-function, but it also runs in Python 2.6: 
def tracer(func): # State via enclosing scope and func attr 
def wrapper(*args, **kwargs): # calls is per-function, not global 

wrapper.calls += 1 

print('call %s to %s' % (wrapper.calls, func. _name__)) 

return func(*args, **kwargs) 


wrapper.calls = 0 
return wrapper 


Notice that this only works because the name wrapper is retained in the enclosing 
tracer function’s scope. When we later increment wrapper.calls, we are not changing 
the name wrapper itself, so no nonlocal declaration is required. 
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This scheme was almost relegated to a footnote, because it is more obscure than 
nonlocal in 3.0 and is probably better saved for cases where other schemes don’t help. 
However, we will employ it in an answer to one of the end-of-chapter questions, where 
we'll need to access the saved state from outside the decorator’s code; nonlocals can 
only be seen inside the nested function itself, but function attributes have wider 
visibility. 

Because decorators often imply multiple levels of callables, you can combine functions 
with enclosing scopes and classes with attributes to achieve a variety of coding struc- 
tures. As we'll see later, though, this sometimes may be subtler than you expect—each 
decorated function should have its own state, and each decorated class may require 
state both for itself and for each generated instance. 


In fact, as the next section will explain, if we want to apply function decorators to class 
methods, too, we also have to be careful about the distinction Python makes between 
decorators coded as callable class instance objects and decorators coded as functions. 


Class Blunders |: Decorating Class Methods 


When I wrote the first tracer function decorator above, I naively assumed that it could 
also be applied to any method—decorated methods should work the same, but the 
automatic self instance argument would simply be included at the front of *args. Un- 
fortunately, I was wrong: when applied to a class’s method, the first version of the 
tracer fails, because self is the instance of the decorator class and the instance of the 
decorated subject class in not included in *args. This is true in both Python 3.0 and 2.6. 


I introduced this phenomenon earlier in this chapter, but now we can see it in the 
context of realistic working code. Given the class-based tracing decorator: 


class tracer: 


def init__(self, func): # On @ decorator 
self.calls = 0 # Save func for later call 
self.func = func 

def _call_ (self, *args, **kwargs): # On call to original function 


self.calls += 1 
print('call %s to %s' % (self.calls, self.func. name_)) 
return self.func(*args, **kwargs) 


decoration of simple functions works as advertised earlier: 


@tracer 

def spam(a, b, c): # spam = tracer(spam) 
print(a + b + c) # Triggers tracer.__init__ 

spam(1, 2, 3) # Runs tracer.__call__ 

spam(a=4, b=5, c=6) # spam is an instance attribute 
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However, decoration of class methods fails (more lucid readers might recognize this as 
our Person class resurrected from the object-oriented tutorial in Chapter 27): 


class Person: 
def _ init__(self, name, pay): 
self.name = name 
self.pay = pay 


@tracer 
def giveRaise(self, percent): # giveRaise = tracer(giverRaise) 
self.pay *= (1.0 + percent) 


@tracer 
def lastName(self): # lastName = tracer (lastName) 
return self.name.split()[-1] 


bob = Person('Bob Smith’, 50000) # tracer remembers method funcs 
bob. giveRaise(.25) # Runs tracer.__call__(???, .25) 
print (bob. lastName()) # Runs tracer.__call__(???) 


The root of the problem here is in the self argument of the tracer class’s __call__ 
method—is it a tracer instance or a Person instance? We really need both as it’s coded: 
the tracer for decorator state, and the Person for routing on to the original method. 
Really, self must be the tracer object, to provide access to tracer’s state information; 
this is true whether decorating a simple function or a method. 


Unfortunately, when our decorated method name is rebound to a class instance object 
with a__call_, Python passes only the tracer instance to self; it doesn’t pass along 
the Person subject in the arguments list at all. Moreover, because the tracer knows 
nothing about the Person instance we are trying to process with method calls, there’s 
no way to create a bound method with an instance, and thus no way to correctly dis- 
patch the call. 


In fact, the prior listing winds up passing too few arguments to the decorated method, 
and results in an error. Add a line to the decorator’s _call__ to print all its arguments 
to verify this; as you can see, self is the tracer, and the Person instance is entirely absent: 
<__main_.tracer object at Ox02D6AD90> (0.25,) {} 
call 1 to giveRaise 
Traceback (most recent call last): 
File "C:/misc/tracer.py", line 56, in <module> 
bob.giveRaise(.25) 
File "C:/misc/tracer.py", line 9, in _call__ 
return self.func(*args, **kwargs) 
TypeError: giveRaise() takes exactly 2 positional arguments (1 given) 


As mentioned earlier, this happens because Python passes the implied subject instance 
to self when a method name is bound to a simple function only; when it is an instance 
of a callable class, that class’s instance is passed instead. Technically, Python only 
makes a bound method object containing the subject instance when the method is a 
simple function. 
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Using nested functions to decorate methods 


If you want your function decorators to work on both simple functions and class meth- 
ods, the most straightforward solution lies in using one of the other state retention 
solutions described earlier—code your function decorator as nested defs, so that you 
don’t depend on a single self instance argument to be both the wrapper class instance 
and the subject class instance. 


The following alternative applies this fix using Python 3.0 nonlocals. Because decorated 
methods are rebound to simple functions instead of instance objects, Python correctly 
passes the Person object as the first argument, and the decorator propagates it on in the 
first item of *args to the self argument of the real, decorated methods: 


# A decorator for both functions and methods 


def tracer(func): # Use function, not class with __call__ 

calls = 0 # Else "self" is decorator instance only! 
def onCall(*args, **kwargs): 

nonlocal calls 

calls += 1 

print('call %s to %s' % (calls, func. _name_)) 

return func(*args, **kwargs) 
return onCall 


# Applies to simple functions 


@tracer 

def spam(a, b, c): # spam = tracer(spam) 
print(a + b + c) # onCall remembers spam 

spam(1, 2, 3) # Runs onCall(1, 2, 3) 


spam(a=4, b=5, c=6) 


# Applies to class method functions too! 


class Person: 
def _ init__(self, name, pay): 
self.name = name 
self.pay = pay 


@tracer 

def giveRaise(self, percent): # giveRaise = tracer(giverRaise) 
self.pay *= (1.0 + percent) # onCall remembers giveRaise 

@tracer 

def lastName(self): # lastName = tracer(lastName) 


return self.name.split()[-1] 


print('methods...') 

bob = Person('Bob Smith', 50000) 
sue = Person('Sue Jones', 100000) 
print(bob.name, sue.name) 
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sue. giveRaise(.10) # Runs onCall(sue, .10) 
print (sue. pay) 
print(bob.lastName(), sue.lastName() ) # Runs onCall(bob), lastName in scopes 


This version works the same on both functions and methods: 


call 1 to spam 

6 

call 2 to spam 

15 

methods... 

Bob Smith Sue Jones 
call 1 to giveRaise 
110000.0 

call 1 to lastName 
call 2 to lastName 
Smith Jones 


Using descriptors to decorate methods 


Although the nested function solution illustrated in the prior section is the most 
straightforward way to support decorators that apply to both functions and class meth- 
ods, other schemes are possible. The descriptor feature we explored in the prior chapter, 
for example, can help here as well. 


Recall from our discussion in that chapter that a descriptor may be a class attribute 
assigned to objects with a __ get__ method run automatically when that attribute is 
referenced and fetched (object derivation is required in Python 2.6, but not 3.0): 


class Descriptor(object): 
def _ get (self, instance, owner): ... 


class Subject: 
attr = Descriptor() 


X = Subject() 

X.attr # Roughly runs Descriptor.__get__(Subject.attr, X, Subject) 
Descriptors may also have _set__ and _del__ access methods, but we don’t need 
them here. Now, because the descriptor’s _get__ method receives both the descriptor 
class and subject class instances when invoked, it’s well suited to decorating methods 
when we need both the decorator’s state and the original class instance for dispatching 
calls. Consider the following alternative tracing decorator, which is also a descriptor: 


class tracer(object): 


def _ init__(self, func): # On @ decorator 
self.calls = 0 # Save func for later call 
self.func = func 

def _call_ (self, *args, **kwargs): # On call to original func 


self.calls += 1 
print('call %s to %s' % (self.calls, self.func. name_)) 
return self.func(*args, **kwargs) 

def _get_ (self, instance, owner): # On method attribute fetch 
return wrapper(self, instance) 
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class wrapper: 
def _ init__(self, desc, subj): # Save both instances 
self.desc = desc # Route calls back to decr 
self.subj = subj 
def _call_ (self, *args, **kwargs): 
return self.desc(self.subj, *args, **kwargs) # Runs tracer._call__ 


@tracer 
def spam(a, b, c): # spam = tracer(spam) 
...Same as prior... # Uses __call__ only 


class Person: 
@tracer 
def giveRaise(self, percent): # giveRaise = tracer(giverRaise) 
... same as prior... # Makes giveRaise a descriptor 


This works the same as the preceding nested function coding. Decorated functions 
invoke only its __call__, while decorated methods invoke its _get__ first to resolve 
the method name fetch (on instance.method); the object returned by __ get__ retains 
the subject class instance and is then invoked to complete the call expression, thereby 
triggering _call_ (on (args...)). For example, the test code’s call to: 


sue. giveRaise(.10) # Runs __get__ then _call__ 


run’s tracer. _get__ first, because the giveRaise attribute in the Person class has been 
rebound toa descriptor by the function decorator. The call expression then triggers the 
_call__ method of the returned wrapper object, which in turn invokes 
tracer. call. 


The wrapper object retains both descriptor and subject instances, so it can route control 
back to the original decorator/descriptor class instance. In effect, the wrapper object 
saves the subject class instance available during method attribute fetch and adds it to 
the later call’s arguments list, which is passed to __ca11__. Routing the call back to the 
descriptor class instance this way is required in this application so that all calls to a 
wrapped method use the same calls counter state information in the descriptor in- 
stance object. 


Alternatively, we could use a nested function and enclosing scope references to achieve 
the same effect—the following version works the same as the preceding one, by swap- 
ping a class and object attributes for a nested function and scope references, but it 
requires noticeably less code: 


class tracer(object): 


def _ init__(self, func): # On @ decorator 
self.calls = 0 # Save func for later call 
self.func = func 

def _call_ (self, *args, **kwargs): # On call to original func 


self.calls += 1 
print('call %s to %s' % (self.calls, self.func. name_)) 
return self.func(*args, **kwargs) 

def _ get_ (self, instance, owner): # On method fetch 
def wrapper(*args, **kwargs): # Retain both inst 
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return self(instance, *args, **kwargs) # Runs __call__ 
return wrapper 


Add print statements to these alternatives’ methods to trace the two-step get/call 
process on your own, and run them with the same test code as in the nested function 
alternative shown earlier. In either coding, this descriptor-based scheme is also sub- 
stantially subtler than the nested function option, and so is probably a second choice 
here; it may be a useful coding pattern in other contexts, though. 


In the rest of this chapter we’ re going to be fairly casual about using classes or functions 
to code our function decorators, as long as they are applied only to functions. Some 
decorators may not require the instance of the original class, and will still work on both 
functions and methods if coded as a class—something like Python’s own 
staticmethod decorator, for example, wouldn’t require an instance of the subject class 
(indeed, its whole point is to remove the instance from the call). 


The moral of this story, though, is that if you want your decorators to work on both 
simple functions and class methods, you're better off using the nested-function-based 
coding pattern outlined here instead of a class with call interception. 


Timing Calls 


To sample the fuller flavor of what function decorators are capable of, let’s turn to a 
different use case. Our next decorator times calls made to a decorated function—both 
the time for one call, and the total time among all calls. The decorator is applied to two 
functions, in order to compare the time requirements of list comprehensions and the 
map built-in call (for comparison, also see Chapter 20 for another nondecorator example 
that times iteration alternatives like these): 


import time 


class timer: 

def _ init__(self, func): 
self. func func 
self.alltime = 0 

def _call_ (self, *args, **kargs): 
start = time.clock() 
result = self.func(*args, **kargs) 
elapsed = time.clock() - start 
self.alltime += elapsed 
print('%s: %.5f, %.5f' % (self.func. name_, elapsed, self.alltime)) 
return result 


I 


I 


@timer 
def listcomp(N): 
return [x * 2 for x in range(N) ] 


@timer 
def mapcall(N): 
return map((lambda x: x * 2), range(N)) 
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result = listcomp(5) # Time for this call, all calls, return value 
listcomp(50000) 

listcomp(500000) 

listcomp(1000000) 

print (result) 

print('allTime = %s' % listcomp.alltime) # Total time for all listcomp calls 


print('') 

result = mapcall(5) 

mapcall(50000) 

mapcall(500000) 

mapcal1(1000000) 

print (result) 

print('allTime = %s' % mapcall.alltime) # Total time for all mapcall calls 


print('map/comp = %s' % round(mapcall.alltime / listcomp.alltime, 3)) 


In this case, a nondecorator approach would allow the subject functions to be used 
with or without timing, but it would also complicate the call signature when timing is 
desired (we’d need to add code at every call instead of once at the def), and there would 
be no direct way to guarantee that all list builder calls in a program are routed through 
timer logic, short of finding and potentially changing them all. 


When run in Python 2.6, the output of this file’s self-test code is as follows: 


listcomp: 0.00002, 0.00002 
listcomp: 0.00910, 0.00912 
listcomp: 0.09105, 0.10017 
listcomp: 0.17605, 0.27622 
[0, 2, 4, 6, 8] 

allTime = 0.276223304917 


mapcall: 0.00003, 0.00003 
mapcall: 0.01363, 0.01366 
mapcall: 0.13579, 0.14945 
mapcall: 0.27648, 0.42593 
[0, 2, 4, 6, 8] 

allTime = 0.425933533452 

map/comp = 1.542 


Testing subtlety: I didn’t run this under Python 3.0 because, as described in Chap- 
ter 14, the map built-in returns an iterator in 3.0, instead of an actual list as in 2.6. Hence, 


3.0’s map doesn’t quite compare directly to a list comprehension’s work (as is, the map 
test takes virtually no time at all in 3.0)). 


If you wish to run this under 3.0, too, use list (map()) to force it to build a list like the 
list comprehension does, or else you’re not really comparing apples to apples. Don’t 
do so in 2.6, though—if you do, the map test will be charged for building two lists, not 
one. 


The following sort of code would pick fairly for 2.6 and 3.0; note, though, that while 
this makes the comparison between list comprehensions and map more fair in either 2.6 
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or 3.0, because range is also an iterator in 3.0, the results for 2.6 and 3.0 won’t compare 
directly: 


import sys 


@timer 
def listcomp(N): 
return [x * 2 for x in range(N)] 


if sys.version_info[0] == 2: 
@timer 
def mapcall(N): 
return map((lambda x: x * 2), range(N)) 
else: 
@timer 
def mapcall(N): 
return list(map((lambda x: x * 2), range(N))) 


Finally, as we learned in the modules part of this book if you want to be able to reuse 
this decorator in other modules, you should indent the self-test code at the bottom of 
the file under a _name__ == '__main__' test so it runs only when the file is run, not 
when it’s imported. We won’t do this, though, because we’re about to add another 
feature to our code. 


Adding Decorator Arguments 


The timer decorator of the prior section works, but it would be nice if it was more 
configurable—providing an output label and turning trace messages on and off, for 
instance, might be useful in a general-purpose tool like this. Decorator arguments come 
in handy here: when they’ re coded properly, we can use them to specify configuration 
options that can vary for each decorated function. A label, for instance, might be added 
as follows: 


def timer(label=''): 
def decorator(func): 


def onCall(*args): # args passed to function 
diets # func retained in enclosing scope 
print(label, ... # label retained in enclosing scope 
return onCall 
return decorator # Returns that actual decorator 
@timer('==>') # Like listcomp = timer('==>') (listcomp) 
def listcomp(N): ... # listcomp is rebound to decorator 
listcomp(...) # Really calls decorator 


This code adds an enclosing scope to retain a decorator argument for use on a later 
actual call. When the listcomp function is defined, it really invokes decorator (the result 
of timer, run before decoration actually occurs), with the label value available in its 
enclosing scope. That is, timer returns the decorator, which remembers both the 


1008 | Chapter 38: Decorators 


decorator argument and the original function and returns a callable which invokes the 
original function on later calls. 


We can put this structure to use in our timer to allow a label and a trace control flag to 
be passed in at decoration time. Here’s an example that does just that, coded in a 
module file named mytools.py so it can be imported as a general tool: 


import time 


def timer(label='', trace=True): # On decorator args: retain args 
class Timer: 
def _ init__(self, func): # On @: retain decorated func 
self. func = func 
self.alltime = 0 
def _call_ (self, *args, **kargs): # On calls: call original 


start = time.clock() 

result = self.func(*args, **kargs) 

elapsed = time.clock() - start 

self.alltime += elapsed 

if trace: 
format = '%s %s: %.5f, %.5f' 
values = (label, self.func.__name_, elapsed, self.alltime) 
print(format % values) 

return result 

return Timer 


Mostly all we’ve done here is embed the original Timer class in an enclosing function, 
in order to create a scope that retains the decorator arguments. The outer timer function 
is called before decoration occurs, and it simply returns the Timer class to serve as the 
actual decorator. On decoration, an instance of Timer is made that remembers the dec- 
orated function itself, but also has access to the decorator arguments in the enclosing 
function scope. 


This time, rather than embedding self-test code in this file, we’ll run the decorator in 
a different file. Here’s a client of our timer decorator, the module file testseqs.py, ap- 
plying it to sequence iteration alternatives again: 


from mytools import timer 


@timer (label='[CCC]==>') 
def listcomp(N): # Like listcomp = timer‘(...) (listcomp) 
return [x * 2 for x in range(N)] # listcomp(...) triggers Timer.__call__ 


@timer(trace=True, label='[MMM]==>') 
def mapcall(N): 
return map((lambda x: x * 2), range(N)) 


for func in (listcomp, mapcall): 
print('') 
result = func(5) # Time for this call, all calls, return value 
func (50000) 
func (500000) 
func (1000000) 
print(result) 
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print('allTime = %s' % func.alltime) # Total time for all calls 
print('map/comp = %s' % round(mapcall.alltime / listcomp.alltime, 3)) 


Again, if you wish to run this fairly in 3.0, wrap the map function in a list call. When 
run as-is in 2.6, this file prints the following output—each decorated function now has 
a label of its own, defined by decorator arguments: 


[CCC]==> listcomp: 0.00003, 0.00003 
[CCC]==> listcomp: 0.00640, 0.00643 
[CCC]==> listcomp: 0.08687, 0.09330 
[CCC]==> listcomp: 0.17911, 0.27241 
[0, 2, 4, 6, 8] 

allTime = 0.272407666337 


[MMM]==> mapcall: 0.00004, 0.00004 
[MMM]==> mapcall: 0.01340, 0.01343 
[MMM]==> mapcall: 0.13907, 0.15250 
[MMM]==> mapcall: 0.27907, 0.43157 
[0, 2, 4, 6, 8] 

allTime = 0.431572169089 

map/comp = 1.584 


As usual, we can also test this interactively to see how the configuration arguments 
come into play: 


>>> from mytools import timer 
>>> @timer(trace=False) # No tracing, collect total time 
... def listcomp(N): 

return [x * 2 for x in range(N)] 


>>> x = listcomp(5000) 

>>> x = listcomp(5000) 

>>> x = listcomp(5000) 

>>> listcomp 

<mytools.Timer instance at 0x025C77B0> 
>>> listcomp.alltime 
0.0051938863738243413 


>>> @timer(trace=True, label='\t=>') # Turn on tracing 
... def listcomp(N): 
ar return [x * 2 for x in range(N)] 


>>> x = listcomp(5000) 

=> listcomp: 0.00155, 0.00155 
>>> x = listcomp(5000) 

=> listcomp: 0.00156, 0.00311 
>>> x = listcomp(5000) 

=> listcomp: 0.00174, 0.00486 
>>> listcomp.alltime 
0.0048562736325408196 


This timing function decorator can be used for any function, both in modules and 
interactively. In other words, it automatically qualifies as a general-purpose tool for 
timing code in our scripts. Watch for another example of decorator arguments in the 
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section “Implementing Private Attributes” on page 1023, and again in “A Basic Range- 
Testing Decorator for Positional Arguments” on page 1035. 


Vs, 

4 
sS Timing methods: This section’s timer decorator works on any function, 
43 but a minor rewrite is required to be able to apply it to class methods 
~~ AS too. In short, as our earlier section “Class Blunders I: Decorating Class 


Methods” on page 1001 illustrated, it must avoid using a nested class. 
Because this mutation will be a subject of one of our end-of-chapter quiz 
questions, though, IIl avoid giving away the answer completely here. 


Coding Class Decorators 


So far we’ve been coding function decorators to manage function calls, but as we’ve 
seen, Python 2.6 and 3.0 extend decorators to work on classes too. As described earlier, 
while similar in concept to function decorators, class decorators are applied to classes 
instead—they may be used either to manage classes themselves, or to intercept instance 
creation calls in order to manage instances. Also like function decorators, class deco- 
rators are really just optional syntactic sugar, though many believe that they make a 
programmer’s intent more obvious and minimize erroneous calls. 


Singleton Classes 


Because class decorators may intercept instance creation calls, they can be used to either 
manage all the instances of a class, or augment the interfaces of those instances. To 
demonstrate, here’s a first class decorator example that does the former—managing all 
instances of a class. This code implements the classic singleton coding pattern, where 
at most one instance of a class ever exists. Its singleton function defines and returns a 
function for managing instances, and the @ syntax automatically wraps up a subject 
class in this function: 


instances = {} 


def getInstance(aClass, *args): # Manage global table 
if aClass not in instances: # Add *kargs for keywords 
instances[aClass] = aClass(*args) # One dict entry per class 


return instances[aClass] 


def singleton(aClass): # On @ decoration 
def onCall(*args): # On instance creation 
return getInstance(aClass, *args) 
return onCall 


To use this, decorate the classes for which you want to enforce a single-instance model: 


@singleton # Person = singleton(Person) 
class Person: # Rebinds Person to onCall 
def _ init__(self, name, hours, rate): # onCall remembers Person 


self.name = name 
self.hours = hours 


Coding Class Decorators | 1011 


self.rate = rate 
def pay(self): 
return self.hours * self.rate 


@singleton # Spam = singleton(Spam) 
class Spam: # Rebinds Spam to onCall 
def init__(self, val): # onCall remembers Spam 


self.attr = val 


bob = Person('Bob', 40, 10) # Really calls onCall 
print(bob.name, bob.pay()) 


sue = Person('Sue', 50, 20) # Same, single object 
print(sue.name, sue.pay()) 


X = Spam(42) # One Person, one Spam 
Y = Spam(99) 
print(X.attr, Y.attr) 


Now, when the Person or Spam class is later used to create an instance, the wrapping 
logic layer provided by the decorator routes instance construction calls to onCall, which 
in turn calls getInstance to manage and share a single instance per class, regardless of 
how many construction calls are made. Here’s this code’s output: 

Bob 400 


Bob 400 
42 42 


Interestingly, you can code a more self-contained solution here if you’re able to use the 
nonlocal statement (available in Python 3.0 and later) to change enclosing scope names, 


as described earlier—the following alternative achieves an identical effect, by using one 
enclosing scope per class, instead of one global table entry per class: 


def singleton(aClass): # On @ decoration 
instance = None 
def onCall(*args): # On instance creation 
nonlocal instance # 3.0 and later nonlocal 
if instance == None: 
instance = aClass(*args) # One scope per class 


return instance 
return onCall 


This version works the same, but it does not depend on names in the global scope 
outside the decorator. In either Python 2.6 or 3.0, you can also code a self-contained 
solution with a class instead—the following uses one instance per class, rather than an 
enclosing scope or global table, and works the same as the other two versions (in fact, 
it relies on the same coding pattern that we will later see is a common decorator class 
blunder; here we want just one instance, but that’s not always the case): 
class singleton: 
def init__(self, aClass): # On @ decoration 
self.aClass = aClass 


self.instance = None 
def _call_ (self, *args): # On instance creation 
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if self.instance == None: 
self.instance = self.aClass(*args) # One instance per class 
return self.instance 


To make this decorator a fully general-purpose tool, store it in an importable module 
file, indent the self-test code under a _name__ check, and add support for keyword 
arguments in construction calls with **kargs syntax (I'll leave this as a suggested 
exercise). 


Tracing Object Interfaces 


The singleton example of the prior section illustrated using class decorators to manage 
all the instances of a class. Another common use case for class decorators augments 
the interface of each generated instance. Class decorators can essentially install on in- 
stances a wrapper logic layer that manages access to their interfaces in some way. 


For example, in Chapter 30, the _ getattr__ operator overloading method is shown as 
a way to wrap up entire object interfaces of embedded instances, in order to implement 
the delegation coding pattern. We saw similar examples in the managed attribute cov- 
erage of the prior chapter. Recall that __getattr__ is run when an undefined attribute 
name is fetched; we can use this hook to intercept method calls in a controller class 
and propagate them to an embedded object. 


For reference, here’s the original nondecorator delegation example, working on two 
built-in type objects: 


class Wrapper: 
def init__(self, object): 


self.wrapped = object # Save object 
def _ getattr_(self, attrname): 
print('Trace:', attrname) # Trace fetch 


return getattr(self.wrapped, attrname) # Delegate fetch 


>>> x = Wrapper([1,2,3]) # Wrap a list 

>>> x.append(4) # Delegate to list method 
Trace: append 

>>> x.wrapped # Print my member 

[1, 2, 3, 4] 

>>> x = Wrapper({"a": 1, "b": 2}) # Wrap a dictionary 

>>> list(x.keys()) # Delegate to dictionary method 
Trace: keys # Use list() in 3.0 


['a', 'b'] 
In this code, the Wrapper class intercepts access to any of the wrapped object’s attributes, 
prints a trace message, and uses the getattr built-in to pass off the request to the 
wrapped object. Specifically, it traces attribute accesses made outside the wrapped ob- 
ject’s class; accesses inside the wrapped object’s methods are not caught and run nor- 
mally by design. This whole-interface model differs from the behavior of function dec- 
orators, which wrap up just one specific method. 
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Class decorators provide an alternative and convenient way to code this _ getattr__ 
technique to wrap an entire interface. In 2.6 and 3.0, for example, the prior class ex- 
ample can be coded as a class decorator that triggers wrapped instance creation, instead 
of passing a pre-made instance into the wrapper’s constructor (also augmented here to 
support keyword arguments with **kargs and to count the number of accesses made): 


def Tracer(aClass): # On @ decorator 
class Wrapper: 
def init__(self, *args, **kargs): # On instance creation 
self.fetches = 0 
self.wrapped = aClass(*args, **kargs) # Use enclosing scope name 


def _ getattr_(self, attrname): 
print('Trace: ' + attrname) # Catches all but own attrs 
self.fetches += 1 
return getattr(self.wrapped, attrname) # Delegate to wrapped obj 
return Wrapper 


@Tracer 
class Spam: # Spam = Tracer(Spam) 
def display(self): # Spam is rebound to Wrapper 
print('Spam!' * 8) 
@Tracer 
class Person: # Person = Tracer(Person) 
def _ init__(self, name, hours, rate): # Wrapper remembers Person 
self.name = name 
self.hours = hours 
self.rate = rate 
def pay(self): # Accesses outside class traced 
return self.hours * self.rate # In-method accesses not traced 
food = Spam() # Triggers Wrapper() 
food.display() # Triggers __getattr__ 
print ([food.fetches]) 
bob = Person('Bob', 40, 50) # bob is really a Wrapper 
print (bob.name) # Wrapper embeds a Person 
print (bob. pay()) 
print('') 
sue = Person('Sue', rate=100, hours=60) # sue is a different Wrapper 
print (sue.name) # with a different Person 


print (sue. pay()) 


print (bob.name) # bob has different state 
print (bob. pay()) 
print([bob.fetches, sue.fetches]) # Wrapper attrs not traced 


It’s important to note that this is very different from the tracer decorator we met earlier. 
In “Coding Function Decorators” on page 996, we looked at decorators that enabled 
us to trace and time calls to a given function or method. In contrast, by intercepting 
instance creation calls, the class decorator here allows us to trace an entire object 
interface—i.e., accesses to any of its attributes. 
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The following is the output produced by this code under both 2.6 and 3.0: attribute 
fetches on instances of both the Spam and Person classes invoke the _ getattr__ logic 
in the Wrapper class, because food and bob are really instances of Wrapper, thanks to the 
decorator’s redirection of instance creation calls: 

Trace: display 


Spam! Spam! Spam! Spam! Spam! Spam! Spam! Spam! 
[1] 

Trace: name 

Bob 

Trace: pay 

2000 


Trace: name 
Sue 

Trace: pay 
6000 

Trace: name 
Bob 

Trace: pay 
2000 

[4, 2] 


Notice that the preceding code decorates a user-defined class. Just like in the original 
example in Chapter 30, we can also use the decorator to wrap up a built-in type such 
as a list, as long as we either subclass to allow decoration syntax or perform the deco- 
ration manually—decorator syntax requires a class statement for the @ line. 


In the following, x is really a Wrapper again due to the indirection of decoration (I moved 
the decorator class to module file tracer.py in order to reuse it this way): 


>>> from tracer import Tracer # Decorator moved to a module file 


>>> @Tracer 


. class MyList(list): pass # MyList = Tracer(MyList) 
>>> x = MyList([1, 2, 3]) # Triggers Wrapper() 
>>> x.append(4) # Triggers __getattr__, append 


Trace: append 
>>> X.wrapped 


[1, 2, 3, 4] 
>>> WrapList = Tracer(list) # Or perform decoration manually 
>>> x = WrapList([4, 5, 6]) # Else subclass statement required 


>>> x.append(7) 
Trace: append 
>>> X.wrapped 
[4, 5, 6, 7] 


The decorator approach allows us to move instance creation into the decorator itself, 
instead of requiring a premade object to be passed in. Although this seems like a minor 
difference, it lets us retain normal instance creation syntax and realize all the benefits 
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of decorators in general. Rather than requiring all instance creation calls to route objects 
through a wrapper manually, we need only augment classes with decorator syntax: 
@Tracer # Decorator approach 
class Person: ... 
bob = Person('Bob', 40, 50) 
sue = Person('Sue', rate=100, hours=60) 


class Person: ... # Non-decorator approach 
bob = Wrapper(Person('Bob', 40, 50)) 
sue = Wrapper(Person('Sue', rate=100, hours=60)) 


Assuming you will make more than one instance of a class, decorators will generally 
be a net win in terms of both code size and code maintenance. 


Va, 
sS Attribute version skew note: As we learned in Chapter 37, _ getattr__ 
a & will intercept accesses to operator overloading methods like __str__ and 
ae __repr__ in Python 2.6, but not in 3.0. 


In Python 3.0, class instances inherit defaults for some (but not all) of 
these names from the class (really, from the automatic object super- 
class), because all classes are “new-style.” Moreover, in 3.0 implicitly 
invoked attributes for built-in operations like printing and + are not 
routed through _ getattr__ (or its cousin, _ getattribute_). New- 
style classes look up such methods in classes and skip the normal 
instance lookup entirely. 


Here, this means that the __ getattr__-based tracing wrapper will auto- 
matically trace and propagate operator overloading calls in 2.6, but not 
in 3.0. To see this, display “x” directly at the end of the preceding in- 
teractive session—in 2.6 the attribute __repr__ is traced and the list 
prints as expected, but in 3.0 no trace occurs and the list prints using a 


default display for the Wrapper class: 


>>> x # 2.6 
Trace: _ repr__ 

[4, 5, 6, 7] 

>>> x # 3.0 


<tracer.Wrapper object at 0x026C07D0> 


To work the same in 3.0, operator overloading methods generally need 
to be redefined redundantly in the wrapper class, either by hand, by 
tools, or by definition in superclasses. Only simple named attributes will 
work the same in both versions. We’ll see this version skew at work 
again in a Private decorator later in this chapter. 


Class Blunders II: Retaining Multiple Instances 


Curiously, the decorator function in this example can almost be coded as a class instead 
of a function, with the proper operator overloading protocol. The following slightly 
simplified alternative works similarly because its _init__ is triggered when the @ dec- 
orator is applied to the class, and its call __ is triggered when a subject class instance 
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is created. Our objects are really instances of Tracer this time, and we essentially just 
trade an enclosing scope reference for an instance attribute here: 


class Tracer: 

def _ init__(self, aClass): 
self.aClass = aClass 

def _call_ (self, *args): 
self.wrapped = self.aClass(*args) 
return self 

def _ getattr_(self, attrname): 
print('Trace: ' + attrname) 


return getattr(self.wrapped, attrname) 


@Tracer 
class Spam: 
def display(self): 
print('Spam!' * 8) 


food = Spam() 
food.display() 


# On @decorator 

# Use instance attribute 

# On instance creation 

# ONE (LAST) INSTANCE PER CLASS! 


# Triggers __init__ 
# Like: Spam = Tracer(Spam) 


# Triggers __call__ 
# Triggers __getattr__ 


As we saw in the abstract earlier, though, this class-only alternative handles multiple 
classes as before, but it won’t quite work for multiple instances of a given class: each 
instance construction call triggers _call__, which overwrites the prior instance. The 
net effect is that Tracer saves just one instance—the last one created. Experiment with 
this yourself to see how, but here’s an example of the problem: 


@Tracer 
class Person: 
def _ init__(self, name): 
self.name = name 


bob = Person('Bob') 
print (bob.name) 
Sue = Person('Sue') 
print (sue.name) 
print (bob.name) 


# Person = Tracer(Person) 
# Wrapper bound to Person 


# bob is really a Wrapper 
# Wrapper embeds a Person 


# sue overwrites bob 
# OOPS: now bob's name is 'Sue'! 


This code’s output follows—because this tracer only has a single shared instance, the 


second overwrites the first: 


Trace: name 
Bob 
Trace: name 
Sue 
Trace: name 
Sue 


The problem here is bad state retention—we make one decorator instance per class, 
but not per class instance, such that only the last instance is retained. The solution, as 
in our prior class blunder for decorating methods, lies in abandoning class-based 


decorators. 
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The earlier function-based Tracer version does work for multiple instances, because 
each instance construction call makes a new Wrapper instance, instead of overwriting 
the state of a single shared Tracer instance; the original nondecorator version handles 
multiple instances correctly for the same reason. Decorators are not only arguably 
magical, they can also be incredibly subtle! 


Decorators Versus Manager Functions 


Regardless of such subtleties, the Tracer class decorator example ultimately still relies 
on _ getattr__to intercept fetches on a wrapped and embedded instance object. As 
we saw earlier, all we’ve really accomplished is moving the instance creation call inside 
a class, instead of passing the instance into a manager function. With the original non- 
decorator tracing example, we would simply code instance creation differently: 


class Spam: # Non-decorator version 

ae # Any class will do 
food = Wrapper(Spam()) # Special creation syntax 
@Tracer 
class Spam: # Decorator version 

eo # Requires @ syntax at class 
food = Spam() # Normal creation syntax 


Essentially, class decorators shift special syntax requirements from the instance creation 
call to the class statement itself. This is also true for the singleton example earlier in 
this section—rather than decorating a class and using normal instance creation calls, 
we could simply pass the class and its construction arguments into a manager function: 
instances = {} 
def getInstance(aClass, *args): 
if aClass not in instances: 


instances[aClass] = aClass(*args) 
return instances[aClass] 


bob = getInstance(Person, 'Bob', 40, 10) # Versus: bob = Person('Bob', 40, 10) 


Alternatively, we could use Python’s introspection facilities to fetch the class from an 
already-created instance (assuming creating an initial instance is acceptable): 
instances = {} 
def getInstance(object): 
aClass = object. class _ 
if aClass not in instances: 


instances[aClass] = object 
return instances[aClass] 


bob = getInstance(Person('Bob', 40, 10)) # Versus: bob = Person(‘Bob', 40, 10) 


The same holds true for function decorators like the tracer we wrote earlier: rather than 
decorating a function with logic that intercepts later calls, we could simply pass the 
function and its arguments into a manager that dispatches the call: 
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def func(x, y): # Nondecorator version 
# def tracer(func, args): ... func(‘args) 


result = tracer(func, (1, 2)) # Special call syntax 
@tracer 
def func(x, y): # Decorator version 
ee # Rebinds name: func = tracer (func) 
result = func(1, 2) # Normal call syntax 


Manager function approaches like this place the burden of using special syntax on 
calls, instead of expecting decoration syntax at function and class definitions. 


Why Decorators? (Revisited) 


So why did I just show you ways to not use decorators to implement singletons? As I 
mentioned at the start of this chapter, decorators present us with tradeoffs. Although 
syntax matters, we all too often forget to ask the “why” questions when confronted 
with new tools. Now that we’ve seen how decorators actually work, let’s step back for 
a minute to glimpse the big picture here. 


Like most language features, decorators have both pros and cons. For example, in the 
negatives column, class decorators suffer from two potential drawbacks: 


Type changes 
As we’ve seen, when wrappers are inserted, a decorated function or class does not 
retain its original type—its name is rebound to a wrapper object, which might 
matter in programs that use object names or test object types. In the singleton 
example, both the decorator and manager function approaches retain the original 
class type for instances; in the tracer code, neither approach does, because wrap- 
pers are required. 


Extra calls 
A wrapping layer added by decoration incurs the additional performance cost of 
an extra call each time the decorated object is invoked—calls are relatively time- 
expensive operations, so decoration wrappers can make a program slower. In the 
tracer code, both approaches require each attribute to be routed through a wrapper 
layer; the singleton example avoids extra calls by retaining the original class type. 


Similar concerns apply with function decorators: both decoration and manager func- 
tions incur extra calls, and type changes generally occur when decorating (but not 
otherwise). 


That said, neither of these is a very serious issue. For most programs, the type difference 
issue is unlikely to matter and the speed hit of the extra calls will be insignificant; 
furthermore, the latter occurs only when wrappers are used, can often be negated by 
simply removing the decorator when optimal performance is required, and is also in- 
curred by nondecorator solutions that add wrapping logic (including metaclasses, as 
we'll see in Chapter 39). 


Coding Class Decorators | 1019 


Conversely, as we saw at the start of this chapter, decorators have three main advan- 
tages. Compared to the manager (a.k.a. “helper”) function solutions of the prior sec- 
tion, decorators offer: 


Explicit syntax 
Decorators make augmentation explicit and obvious. Their @ syntax is easier to 
recognize than special code in calls that may appear anywhere in a source file—in 
our singleton and tracer examples, for instance, the decorator lines seem more 
likely to be noticed than extra code at calls would be. Moreover, decorators allow 
function and instance creation calls to use normal syntax familiar to all Python 
programmers. 


Code maintenance 
Decorators avoid repeated augmentation code at each function or class call. Be- 
cause they appear just once, at the definition of the class or function itself, they 
obviate redundancy and simplify future code maintenance. For our singleton and 
tracer cases, we need to use special code at each call to use a manager function 
approach—extra work is required both initially and for any modifications that 
must be made in the future. 


Consistency 

Decorators make it less likely that a programmer will forget to use required wrap- 
ping logic. This derives mostly from the two prior advantages—because decoration 
is explicit and appears only once, at the decorated objects themselves, decorators 
promote more consistent and uniform API usage than special code that must be 
included at each call. In the singleton example, for instance, it would be easy to 
forget to route all class creation calls through special code, which would subvert 
the singleton management altogether. 


Decorators also promote code encapsulation to reduce redundancy and minimize future 
maintenance effort; although other code structuring tools do too, decorators make this 
natural for augmentation tasks. 


None of these benefits completely requires decorator syntax to be achieved, though, 
and decorator usage is ultimately a stylistic choice. That said, most programmers find 
them to be a net win, especially as a tool for using libraries and APIs correctly. 


I can recall similar arguments being made both for and against constructor functions 
in classes—prior to the introduction of __init__ methods, the same effect was often 
achieved by running an instance through a method manually when creating it (e.g., 
X=Class().init()). Over time, though, despite being fundamentally a stylistic choice, 
the _init__ syntax came to be universally preferred because it was more explicit, con- 
sistent, and maintainable. Although you should be the judge, decorators seem to bring 
many of the same assets to the table. 
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Managing Functions and Classes Directly 


Most of our examples in this chapter have been designed to intercept function and 
instance creation calls. Although this is typical for decorators, they are not limited to 
this role. Because decorators work by running new functions and classes through dec- 
orator code, they can also be used to manage function and class objects themselves, 
not just later calls made to them. 


Imagine, for example, that you require methods or classes used by an application to be 
registered to an API for later processing (perhaps that API will call the objects later, in 
response to events). Although you could provide a registration function to be called 
manually after the objects are defined, decorators make your intent more explicit. 


The following simple implementation of this idea defines a decorator that can be ap- 
plied to both functions and classes, to add the object to a dictionary-based registry. 
Because it returns the object itself instead of a wrapper, it does not intercept later calls: 


# Registering decorated objects to an API 


registry = {} 


def register(obj): # Both class and func decorator 
registry[obj.__name__] = obj # Add to registry 
return obj # Return obj itself, not a wrapper 
@register 
def spam(x): 
return(x ** 2) # spam = register(spam) 
@register 
def ham(x): 
return(x ** 3) 
@register 
class Eggs: # Eggs = register(Eggs) 


def init__(self, x): 
self.data = x ** 4 
def _str_ (self): 
return str(self.data) 


print('Registry:') 
for name in registry: 


print(name, '=>', registry[name], type(registry[name])) 


print('\nManual calls:') 


print (spam(2) ) # Invoke objects manually 
print (ham(2)) # Later calls not intercepted 
X = Eggs(2) 

print (X) 


print('\nRegistry calls:') 
for name in registry: 
print(name, '=>', registry[name](3)) # Invoke from registry 


Managing Functions and Classes Directly | 1021 


When this code is run the decorated objects are added to the registry by name, but they 
still work as originally coded when they’re called later, without being routed through 
a wrapper layer. In fact, our objects can be run both manually and from inside the 
registry table: 

Registry: 

Eggs => <class '_ main_.Eggs'> <class 'type'> 

ham => <function ham at 0x02CFB738> <class 'function'> 

spam => <function spam at Ox02CFB6FO> <class 'function'> 


Manual calls: 


4 
8 
16 


Registry calls: 
Eggs => 81 
ham => 27 
spam => 9 


A user interface might use this technique, for example, to register callback handlers for 
user actions. Handlers might be registered by function or class name, as done here, or 
decorator arguments could be used to specify the subject event; an extra def statement 
enclosing our decorator could be used to retain such arguments for use on decoration. 


This example is artificial, but its technique is very general. For example, function dec- 
orators might also be used to process function attributes, and class decorators might 
insert new class attributes, or even new methods, dynamically. Consider the following 
function decorators—they assign function attributes to record information for later use 
by an API, but they do not insert a wrapper layer to intercept later calls: 


# Augmenting decorated objects directly 


>>> def decorate(func): 
func.marked = True # Assign function attribute for later use 
return func 


>>> @decorate 
... def spam(a, b): 
return a + b 


>>> spam.marked 
True 


>>> def annotate(text): # Same, but value is decorator argument 
def decorate(func): 
func.label = text 
return func 
return decorate 


>>> @annotate('spam data’) 
... def spam(a, b): # spam = annotate(...) (spam) 
return a + b 
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>>> spam(1, 2), spam. label 
(3, ‘spam data') 


Such decorators augment functions and classes directly, without catching later calls to 
them. We’ll see more examples of class decorations managing classes directly in the 
next chapter, because this turns out to encroach on the domain of metaclasses; for the 
remainder of this chapter, let’s turn to two larger case studies of decorators at work. 


Example: “Private” and “Public” Attributes 


The final two sections of this chapter present larger examples of decorator use. Both 
are presented with minimal description, partly because this chapter has exceeded its 
size limits, but mostly because you should already understand decorator basics well 
enough to study these on your own. Being general-purpose tools, these examples give 
us a chance to see how decorator concepts come together in more useful code. 


Implementing Private Attributes 


The following class decorator implements a Private declaration for class instance at- 
tributes—that is, attributes stored on an instance, or inherited from one of its classes. 
It disallows fetch and change access to such attributes from outside the decorated class, 
but still allows the class itself to access those names freely within its methods. It’s not 
exactly C++ or Java, but it provides similar access control as an option in Python. 


We saw an incomplete first-cut implementation of instance attribute privacy for 
changes only in Chapter 29. The version here extends this concept to validate attribute 
fetches too, and it uses delegation instead of inheritance to implement the model. In 
fact, in a sense this is just an extension to the attribute tracer class decorator we met 
earlier. 


Although this example utilizes the new syntactic sugar of class decorators to code at- 
tribute privacy, its attribute interception is ultimately still based upon the 
_getattr_ and _setattr__ operator overloading methods we met in prior chapters. 
When a private attribute access is detected, this version uses the raise statement to 
raise an exception, along with an error message; the exception may be caught in a 
try or allowed to terminate the script. 


Here is the code, along with a self test at the bottom of the file. It will work under both 
Python 2.6 and 3.0 because it employs 3.0 print and raise syntax, though it catches 
operator overloading method attributes in 2.6 only (more on this in a moment): 


nun 


Privacy for attributes fetched from class instances. 

See self-test code at end of file for a usage example. 
Decorator same as: Doubler = Private('data', 'size')(Doubler). 
Private returns onDecorator, onDecorator returns onInstance, 
and each onInstance instance embeds a Doubler instance. 


won 
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traceMe = False 
def trace(*args): 
if traceMe: print('[' + 


'.join(map(str, args)) + ']') 


def Private(*privates): # privates in enclosing scope 
def onDecorator(aClass): # aClass in enclosing scope 
class onInstance: # wrapped in instance attribute 


def init__(self, *args, **kargs): 
self.wrapped = aClass(*args, **kargs) 
def _getattr_ (self, attr): # My attrs don't call getattr 
trace('get:', attr) # Others assumed in wrapped 
if attr in privates: 
raise TypeError('private attribute fetch: ' + attr) 


else: 
return getattr(self.wrapped, attr) 
def _setattr_(self, attr, value): # Outside accesses 
trace('set:', attr, value) # Others run normally 
if attr == 'wrapped': # Allow my attrs 
self. dict_[attr] = value # Avoid looping 


elif attr in privates: 
raise TypeError('private attribute change: ' + attr) 
else: 
setattr(self.wrapped, attr, value) # Wrapped obj attrs 
return onInstance # Or use __dict__ 
return onDecorator 


if _name_ == '' _main_': 
traceMe = True 


@Private('data', 'size') # Doubler = Private(...)(Doubler) 
class Doubler: 
def _ init__(self, label, start): 


self.label = label # Accesses inside the subject class 

self.data = start # Not intercepted: run normally 
def size(self): 

return len(self.data) # Methods run with no checking 
def double(self): # Because privacy not inherited 


for i in range(self.size()): 
self.data[i] = self.data[i] * 2 
def display(self): 
print('%s => %s' % (self.label, self.data)) 


X = Doubler('X is', [1, 2, 3]) 
Y = Doubler('Y is', [-10, -20, -30]) 


# The followng all succeed 

print (X. label) # Accesses outside subject class 
X.display(); X.double(); X.display() # Intercepted: validated, delegated 
print(Y. label) 

Y.display(); Y.double() 

Y.label = 'Spam' 

Y.display() 
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# The following all fail properly 


mon 


print(X.size()) # prints "TypeError: private attribute fetch: size" 
print(X.data) 

X.data = [1, 1, 1] 

X.size = lambda S: 0 

print(Y.data) 

print(Y.size()) 


nan 


When traceMe is True, the module file’s self-test code produces the following output. 
Notice how the decorator catches and validates both attribute fetches and assignments 
run outside of the wrapped class, but does not catch attribute accesses inside the class 
itself: 

[set: wrapped <__main__.Doubler object at 0x02B2AAFO>] 

[set: wrapped <__main__.Doubler object at 0x02B2AE70>] 

[get: label] 

X is 

[get: display] 

X is => [1, 2, 3] 

[get: double] 

[get: display] 

X is => [2, 4, 6] 

[get: label] 

Y is 

[get: display] 

Y is => [-10, -20, -30] 

[get: double] 

[set: label Spam] 

[get: display] 

Spam => [-20, -40, -60] 


Implementation Details | 


This code is a bit complex, and you’ re probably best off tracing through it on your own 
to see how it works. To help you study, though, here are a few highlights worth 
mentioning. 


Inheritance versus delegation 


The first-cut privacy example shown in Chapter 29 used inheritance to mix in a 
__setattr__ to catch accesses. Inheritance makes this difficult, however, because dif- 
ferentiating between accesses from inside or outside the class is not straightforward 
(inside access should be allowed to run normally, and outside access should be restric- 
ted). To work around this, the Chapter 29 example requires inheriting classes to use 
__dict__ assignments to set attributes—an incomplete solution at best. 


The version here uses delegation (embedding one object inside another) instead of in- 
heritance; this pattern is better suited to our task, as it makes it much easier to distin- 
guish between accesses inside and outside of the subject class. Attribute accesses from 
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outside the subject class are intercepted by the wrapper layer’s overloading methods 
and delegated to the class if valid; accesses inside the class itself (i.e., through self 
inside its methods’ code) are not intercepted and are allowed to run normally without 
checks, because privacy is not inherited here. 


Decorator arguments. The class decorator used here accepts any number of arguments, to 
name private attributes. What really happens, though, is that the arguments are passed 
to the Private function, and Private returns the decorator function to be applied to the 
subject class. That is, the arguments are used before decoration ever occurs; Private 
returns the decorator, which in turn “remembers” the privates list as an enclosing scope 
reference. 


State retention and enclosing scopes 


Speaking of enclosing scopes, there are actually three levels of state retention at work 
in this code: 


¢ The arguments to Private are used before decoration occurs and are retained as 
an enclosing scope reference for use in both onDecorator and onInstance. 


e The class argument to onDecorator is used at decoration time and is retained as an 
enclosing scope reference for use at instance construction time. 


e The wrapped instance object is retained as an instance attribute in onInstance, for 
use when attributes are later accessed from outside the class. 


This all works fairly naturally, given Python’s scope and namespace rules. 


Using__dict__and__slots__ 


The __setattr__in this code relies on an instance object’s__dict__ attribute namespace 
dictionary in order to set onInstance’s own wrapped attribute. As we learned in the prior 
chapter, it cannot assign an attribute directly without looping. However, it uses the 
setattr built-in instead of _dict__ to set attributes in the wrapped object itself. More- 
over, getattr is used to fetch attributes in the wrapped object, since they may be stored 
in the object itself or inherited by it. 


Because of that, this code will work for most classes. You may recall from Chapter 31 
that new-style classes with _slots__ may not store attributes ina __dict__. However, 
because we only rely on a _dict__ at the onInstance level here, not in the wrapped 
instance, and because setattr and getattr apply to attributes based on both 
__dict__and_ slots_, our decorator applies to classes using either storage scheme. 


Generalizing for Public Declarations, Too 


Now that we have a Private implementation, it’s straightforward to generalize the code 
to allow for Public declarations too—they are essentially the inverse of Private decla- 
rations, so we need only negate the inner test. The example listed in this section allows 
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a class to use decorators to define a set of either Private or Public instance attributes 
(attributes stored on an instance or inherited from its classes), with the following 
semantics: 


e Private declares attributes of a class’s instances that cannot be fetched or assigned, 
except from within the code of the class’s methods. That is, any name declared 
Private cannot be accessed from outside the class, while any name not declared 
Private can be freely fetched or assigned from outside the class. 


e Public declares attributes of a class’s instances that can be fetched or assigned from 
both outside the class and within the class’s methods. That is, any name declared 
Public can be freely accessed anywhere, while any name not declared Public cannot 
be accessed from outside the class. 


Private and Public declarations are intended to be mutually exclusive: when using 
Private, all undeclared names are considered Public, and when using Public, all un- 
declared names are considered Private. They are essentially inverses, though unde- 
clared names not created by class methods behave slightly differently—they can be 
assigned and thus created outside the class under Private (all undeclared names are 
accessible), but not under Public (all undeclared names are inaccessible). 


Again, study this code on your own to get a feel for how this works. Notice that this 
scheme adds an additional fourth level of state retention at the top, beyond that descri- 
bed in the preceding section: the test functions used by the lambdas are saved in an extra 
enclosing scope. This example is coded to run under either Python 2.6 or 3.0, though 
it comes with a caveat when run under 3.0 (explained briefly in the file’s docstring and 
expanded on after the code): 


nun 


Class decorator with Private and Public attribute declarations. 
Controls access to attributes stored on an instance, or inherited 
by it from its classes. Private declares attribute names that 
cannot be fetched or assigned outside the decorated class, and 
Public declares all the names that can. Caveat: this works in 

3.0 for normally named attributes only: _X__ operator overloading 
methods implicitly run for built-in operations do not trigger 
either _getattr_ or _ getattribute_ in new-style classes. 

Add _X__ methods here to intercept and delegate built-ins. 


won 


traceMe = False 
def trace(*args): 
if traceMe: print('[' + 


'.join(map(str, args)) + ']') 


def accessControl(faillf): 
def onDecorator(aClass): 
class onInstance: 
def init__(self, *args, **kargs): 
self. wrapped = aClass(*args, **kargs) 
def _getattr_ (self, attr): 
trace('get:', attr) 
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if failIf(attr): 
raise TypeError('private attribute fetch: ' + attr) 
else: 
return getattr(self. wrapped, attr) 
def _setattr_(self, attr, value): 
trace('set:', attr, value) 
if attr == '_onInstance_wrapped': 
self. dict_ [attr] = value 
elif failIf(attr): 
raise TypeError('private attribute change: ' + attr) 
else: 
setattr(self. wrapped, attr, value) 
return onInstance 
return onDecorator 


def Private(*attributes): 
return accessControl(failIf=(lambda attr: attr in attributes) ) 


def Public(*attributes): 
return accessControl(failIf=(lambda attr: attr not in attributes)) 


See the prior example’s self-test code for a usage example. Here’s a quick look at these 
class decorators in action at the interactive prompt (they work the same in 2.6 and 3.0); 
as advertised, non-Private or Public names can be fetched and changed from outside 
the subject class, but Private or non-Public names cannot: 


>>> from access import Private, Public 


>>> @Private('age') # Person = Private('age') (Person) 
. class Person: # Person = onInstance with state 
def _init_ (self, name, age): 
self.name = name 
self.age = age # Inside accesses run normally 


>>> X = Person('Bob', 40) 

>>> X.name # Outside accesses validated 
"Bob' 

>>> X.name = ‘Sue’ 

>>> X.name 

"Sue' 

>>> X.age 

TypeError: private attribute fetch: age 

>>> X.age = 'Tom' 

TypeError: private attribute change: age 


>>> @Public('name' ) 
. Class Person: 
def init__(self, name, age): 
self.name = name 
self.age = age 


>>> X = Person('bob', 40) # X is an onInstance 
>>> X.name # onInstance embeds Person 
"bob' 


>>> X.name = ‘Sue’ 
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>>> X.name 

"Sue' 

>>> X.age 

TypeError: private attribute fetch: age 
>>> X.age = 'Tom' 

TypeError: private attribute change: age 


Implementation Details II 


To help you analyze the code, here are a few final notes on this version. Since this is 
just a generalization of the preceding section’s example, most of the notes there apply 
here as well. 


Using __X pseudoprivate names 


Besides generalizing, this version also makes use of Python’s __X pseudoprivate name 
mangling feature (which we met in Chapter 30) to localize the wrapped attribute to the 
control class, by automatically prefixing it with the class name. This avoids the prior 
version’s risk for collisions with a wrapped attribute that may be used by the real, wrap- 
ped class, and it’s useful in a general tool like this. It’s not quite “privacy,” though, 
because the mangled name can be used freely outside the class. Notice that we also 
have to use the fully expanded name string ('_onInstance_wrapped') in ___setattr_, 
because that’s what Python changes it to. 


Breaking privacy 


Although this example does implement access controls for attributes of an instance and 
its classes, it is possible to subvert these controls in various ways—for instance, by 
going through the expanded version of the wrapped attribute explicitly (bob. pay might 
not work, but the fully mangled bob._onInstance__wrapped.pay could!). If you have to 
explicitly try to do so, though, these controls are probably sufficient for normal 
intended use. Of course, privacy controls can generally be subverted in any language 
if you try hard enough (#define private public may work in some C++ implementa- 
tions, too). Although access controls can reduce accidental changes, much of this is up 
to programmers in any language; whenever source code may be changed, access control 
will always be a bit of a pipe dream. 


Decorator tradeoffs 


We could again achieve the same results without decorators, by using manager func- 
tions or coding the name rebinding of decorators manually; the decorator syntax, how- 
ever, makes this consistent and a bit more obvious in the code. The chief potential 
downsides of this and any other wrapper-based approach are that attribute access in- 
curs an extra call, and instances of decorated classes are not really instances of the 
original decorated class—if you test their type with X.__class__ or isinstance(X, C), 
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for example, you'll find that they are instances of the wrapper class. Unless you plan 
to do introspection on objects’ types, though, the type issue is probably irrelevant. 


Open Issues 


As is, this example works as planned under Python 2.6 and 3.0 (provided operator 
overloading methods to be delegated are redefined in the wrapper). As with most soft- 
ware, though, there is always room for improvement. 


Caveat: operator overloading methods fail to delegate under 3.0 


Like all delegation-based classes that use _ getattr__, this decorator works cross- 
version for normally named attributes only; operator overloading methods like 
__str__and__add_ work differently for new-style classes and so fail to reach the em- 
bedded object if defined there when this runs under 3.0. 


As we learned in the prior chapter, classic classes look up operator overloading names 
in instances at runtime normally, but new-style classes do not—they skip the instance 
entirely and look up such methods in classes. Hence, the _X__ operator overloading 
methods implicitly run for built-in operations do not trigger either __getattr__ or 
__getattribute__ in new-style classes in 2.6 and all classes in 3.0; such attribute fetches 


skip our onInstance.__getattr__ altogether, so they cannot be validated or delegated. 


Our decorator’s class is not coded as new-style (by deriving from object), so it will 
catch operator overloading methods if run under 2.6. Since all classes are new-style 
automatically in 3.0, though, such methods will fail if they are coded on the embedded 
object. The simplest workaround in 3.0 is to redefine redundantly in onInstance all the 
operator overloading methods that can possibly be used in wrapped objects. Such extra 
methods can be added by hand, by tools that partly automate the task (e.g., with class 
decorators or the metaclasses discussed in the next chapter), or by definition in 
superclasses. 


To see the difference yourself, try applying the decorator to a class that uses operator 
overloading methods under 2.6; validations work as before, and both the _ str __ 
method used by printing and the _add_ method run for + invoke the decorator’s 
_ getattr_ and hence wind up being validated and delegated to the subject Person 
object correctly: 


C:\misc> c:\python26\python 
>>> from access import Private 
>>> @Private('age') 
. Class Person: 
def __ init__(self): 
self.age = 42 
def _str_ (self): 
return 'Person: ' + str(self.age) 
def _add_(self, yrs): 
self.age += yrs 
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>>> X = Person() 


>>> X.age # Name validations fail correctly 
TypeError: private attribute fetch: age 

>>> print(X) # __getattr__=> runs Person.__str__ 
Person: 42 

>>> X + 10 # __getattr__=> runs Person.__add__ 
>>> print(X) # __getattr__=> runs Person.__str__ 
Person: 52 


When the same code is run under Python 3.0, though, the implicitly invoked _ str __ 
and _add_ skip the decorator’s _getattr__ and look for definitions in or above the 
decorator class itself; print winds up finding the default display inherited from the class 
type (technically, from the implied object superclass in 3.0), and + generates an error 
because no default is inherited: 
C:\misc> c:\python30\python 
>>> from access import Private 
>>> @Private('age') 
. Class Person: 
def _ init__(self): 
self.age = 42 
def _str_ (self): 
return ‘Person: ' + str(self.age) 
def _add_ (self, yrs): 
self.age += yrs 


>>> X = Person() # Name validations still work 

>>> X.age # But 3.0 fails to delegate built-ins! 

TypeError: private attribute fetch: age 

>>> print(X) 

<access.onInstance object at 0x025E0790> 

>>> X + 10 

TypeError: unsupported operand type(s) for +: ‘onInstance' and ‘int' 

>>> print(X) 

<access.onInstance object at 0x025E0790> 
Using the alternative _getattribute__ method won’t help here—although it is defined 
to catch every attribute reference (not just undefined names), it is also not run by built- 
in operations. Python’s property feature, which we met in Chapter 37, won’t help here 
either; recall that properties are automatically run code associated with specific 
attributes defined when a class is written, and are not designed to handle arbitrary 
attributes in wrapped objects. 


As mentioned earlier, the most straightforward solution under 3.0 is to redundantly 
redefine operator overloading names that may appear in embedded objects in 
delegation-based classes like our decorator. This isn’t ideal because it creates some code 
redundancy, especially compared to 2.6 solutions. However, it isn’t too major a coding 
effort, can be automated to some extent with tools or superclasses, suffices to make 
our decorator work in 3.0, and allows operator overloading names to be declared 
Private or Public too (assuming each overloading method runs the failIf test 
internally): 
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def accessControl(faillf): 
def onDecorator(aClass): 
class onInstance: 
def init__(self, *args, **kargs): 
self.__wrapped = aClass(*args, **kargs) 


# Intercept and delegate operator overloading methods 
def _str_ (self): 

return str(self.__ wrapped) 
def _add_ (self, other): 

return self. __wrapped + other 
def _ getitem_(self, index): 

return self. __wrapped[index] # If needed 
def _call_(self, *args, **kargs): 

return self. _wrapped(*arg, *kargs) # If needed 
.. plus any others needed... 


# Intercept and delegate named attributes 
def _getattr_ (self, attr): 


def _setattr_(self, attr, value): 


return onInstance 
return onDecorator 


With such operator overloading methods added, the prior example with _ str__ and 
__add__ works the same under 2.6 and 3.0, although a substantial amount of extra code 
may be required to accommodate 3.0—in principle, every operator overloading method 
that is not run automatically will need to be defined redundantly for 3.0 in a general 
tool class like this (which is why this extension is omitted in our code). Since every class 
is new-style in 3.0, delegation-based code is more difficult (though not impossible) in 
this release. 


On the other hand, delegation wrappers could simply inherit from a common super- 
class that redefines operator overloading methods once, with standard delegation code. 
Moreover, tools such as additional class decorators or metaclasses might automate 
some of the work of adding such methods to delegation classes (see the class augmen- 
tation examples in Chapter 39 for details). Though still not as simple as the 2.6 solution, 
such techniques might help make 3.0 delegation classes more general. 


Implementation alternatives: _getattribute__ inserts, call stack inspection 


Although redundantly defining operator overloading methods in wrappers is probably 
the most straightforward workaround to Python 3.0 dilemma outlined in the prior 
section, it’s not necessarily the only one. We don’t have space to explore this issue 
much further here, so investigating other potential solutions is relegated to a suggested 
exercise. Because one dead-end alternative underscores class concepts well, though, it 
merits a brief mention. 


One downside of this example is that instance objects are not truly instances of the 
original class—they are instances of the wrapper instead. In some programs that rely 
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on type testing, this might matter. To support such cases, we might try to achieve similar 
effects by inserting a___getattribute_ method into the original class, to catch every 
attribute reference made on its instances. This inserted method would pass valid 
requests up to its superclass to avoid loops, using the techniques we studied in the prior 
chapter. Here is the potential change to our class decorator’s code: 


# trace support as before 


def accessControl(faillf): 
def onDecorator(aClass): 
def getattributes(self, attr): 
trace('get:', attr) 
if failIf(attr): 
raise TypeError('private attribute fetch: ' + attr) 
else: 
return object. _getattribute_ (self, attr) 
aClass.__getattribute__ = getattributes 
return aClass 
return onDecorator 


def Private(*attributes): 
return accessControl(failIf=(lambda attr: attr in attributes)) 


def Public(*attributes): 
return accessControl(failIf=(lambda attr: attr not in attributes)) 


This alternative addresses the type-testing issue but suffers from others. For example, 
it handles only attribute fetches—as is, this version allows private names to be as- 
signed freely. Intercepting assignments would still have to use _ setattr__, and either 
an instance wrapper object or another class method insertion. Adding an instance 
wrapper to catch assignments would change the type again, and inserting methods fails 
if the original class is using a __setattr__ of its own (ora _ getattribute_, for that 
matter!). An inserted _setattr__ would also have to allow fora__slots__ inthe client 
class. 


In addition, this scheme does not address the built-in operation attributes issue 
described in the prior section, since __getattribute_ is not run in these contexts, 
either. In our case, if Person hada __str__ it would be run by print operations, but only 
because it was actually present in that class. As before, the _ str__ attribute would 
not be routed to the inserted _ getattribute__ method generically—printing would 
bypass this method altogether and call the class’s___str__ directly. 


Although this is probably better than not supporting operator overloading methods in 
a wrapped object at all (barring redefinition, at least), this scheme still cannot intercept 
and validate _X__ methods, making it impossible for any of them to be Private. Al- 
though most operator overloading methods are meant to be public, some might not be. 


Much worse, because this nonwrapper approach works by adding a 
__getattribute_ to the decorated class, it also intercepts attribute accesses made by 
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the class itself and validates them the same as accesses made from outside—this means 
the class’s method won’t be able to use Private names, either! 


In fact, inserting methods this way is functionally equivalent to inheriting them, and 
implies the same constraints as our original Chapter 29 privacy code. To know whether 
an attribute access originated inside or outside the class, our method might need to 
inspect frame objects on the Python call stack. This might ultimately yield a solution 
(replace private attributes with properties or descriptors that check the stack, for ex- 
ample), but it would slow access further and is far too dark a magic for us to explore 
here. 


While interesting, and possibly relevant for some other use cases, this method insertion 
technique doesn’t meet our goals. We won’t explore this option’s coding pattern fur- 
ther here because we will study class augmentation techniques in the next chapter, in 
conjunction with metaclasses. As we’ll see there, metaclasses are not strictly required 
for changing classes this way, because class decorators can often serve the same role. 


Python Isn’t About Control 


Now that I’ve gone to such great lengths to add Private and Public attribute declara- 
tions for Python code, I must again remind you that it is not entirely Pythonic to add 
access controls to your classes like this. In fact, most Python programmers will probably 
find this example to be largely or totally irrelevant, apart from serving as a demonstra- 
tion of decorators in action. Most large Python programs get by successfully without 
any such controls at all. If you do wish to regulate attribute access in order to eliminate 
coding mistakes, though, or happen to be a soon-to-be-ex-C++-or-Java programmer, 
most things are possible with Python’s operator overloading and introspection tools. 


Example: Validating Function Arguments 


As a final example of the utility of decorators, this section develops a function decora- 
tor that automatically tests whether arguments passed to a function or method are 
within a valid numeric range. It’s designed to be used during either development or 
production, and it can be used as a template for similar tasks (e.g., argument type 
testing, if you must). Because this chapter’s size limits has been broached, this exam- 
ple’s code is largely self-study material, with limited narrative; as usual, browse the 
code for more details. 


The Goal 


In the object-oriented tutorial of Chapter 27, we wrote a class that gave a raise to objects 
representing people based upon a passed-in percentage: 


class Person: 
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def giveRaise(self, percent): 
self.pay = int(self.pay * (1 + percent)) 
There, we noted that if we wanted the code to be robust it would be a good idea to 
check the percentage to make sure it’s not too large or too small. We could implement 
such a check with either if or assert statements in the method itself, using inline tests: 
class Person: 
def giveRaise(self, percent): # Validate with inline code 
if percent < 0.0 or percent > 1.0: 


raise TypeError, ‘percent invalid’ 
self.pay = int(self.pay * (1 + percent)) 


class Person: # Validate with asserts 
def giveRaise(self, percent): 

assert percent >= 0.0 and percent <= 1.0, ‘percent invalid’ 

self.pay = int(self.pay * (1 + percent)) 
However, this approach clutters up the method with inline tests that will probably be 
useful only during development. For more complex cases, this can become tedious 
(imagine trying to inline the code needed to implement the attribute privacy provided 
by the last section’s decorator). Perhaps worse, if the validation logic ever needs to 
change, there may be arbitrarily many inline copies to find and update. 


A more useful and interesting alternative would be to develop a general tool that can 
perform range tests for us automatically, for the arguments of any function or method 
we might code now or in the future. A decorator approach makes this explicit and 
convenient: 
class Person: 
@rangetest(percent=(0.0, 1.0)) # Use decorator to validate 


def giveRaise(self, percent): 
self.pay = int(self.pay * (1 + percent)) 


Isolating validation logic in a decorator simplifies both clients and future maintenance. 


Notice that our goal here is different than the attribute validations coded in the prior 
chapter’s final example. Here, we mean to validate the values of function arguments 
when passed, rather than attribute values when set. Python’s decorator and introspec- 
tion tools allow us to code this new task just as easily. 


A Basic Range-Testing Decorator for Positional Arguments 


Let’s start with a basic range test implementation. To keep things simple, we’ll begin 
by coding a decorator that works only for positional arguments and assumes they al- 
ways appear at the same position in every call; they cannot be passed by keyword name, 
and we don’t support additional **args keywords in calls because this can invalidate 
the positions declared in the decorator. Code the following in a file called devtools.py: 


def rangetest (*argchecks) : # Validate positional arg ranges 
def onDecorator(func): 
if not _ debug_: # True if "python -O main.py args..." 
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return func # No-op: call original directly 
else: # Else wrapper while debugging 
def onCall(*args): 
for (ix, low, high) in argchecks: 
if args[ix] < low or args[ix] > high: 
errmsg = ‘Argument %s not in %s..%s' % (ix, low, high) 
raise TypeError(errmsg) 
return func(*args) 
return onCall 
return onDecorator 


As is, this code is mostly a rehash of the coding patterns we explored earlier: we use 
decorator arguments, nested scopes for state retention, and so on. 


We also use nested def statements to ensure that this works for both simple functions 
and methods, as we learned earlier. When used for a class method, onCal11 receives the 
subject class’s instance in the first item in *args and passes this along to self in the 
original method function; argument numbers in range tests start at 1 in this case, not 0. 


Also notice this code’s use of the __ debug _ built-in variable, though—Python sets this 
to True, unless it’s being run with the -0 optimize command-line flag (e.g., python -0 
main.py). When __debug__ is False, the decorator returns the origin function un- 
changed, to avoid extra calls and their associated performance penalty. 


This first iteration solution is used as follows: 


# File devtools_test.py 


from devtools import rangetest 


print(__debug_ ) # False if "python -O main.py" 
@rangetest((1, 0, 120)) # persinfo = rangetest(...)(persinfo) 
def persinfo(name, age): # age must be in 0..120 


print('%s is %s years old' % (name, age)) 


@rangetest([0, 1, 12], [1, 1, 31], [2, 0, 2009]) 
def birthday(M, D, Y): 
print('birthday = {0}/{1}/{2}'.format(M, D, Y)) 


class Person: 
def init__(self, name, job, pay): 
self.job = job 
self.pay = pay 


@rangetest([1, 0.0, 1.0]) # giveRaise = rangetest(...)(giveRaise) 
def giveRaise(self, percent): # Arg 0 is the self instance here 
self.pay = int(self.pay * (1 + percent)) 


# Comment lines raise TypeError unless "python -O" used on shell command line 


persinfo('Bob Smith', 45) # Really runs onCall(...) with state 
# persinfo('Bob Smith', 200) # Or person if -O cmd line argument 


birthday(5, 31, 1963) 
#birthday(5, 32, 1963) 
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sue = Person('Sue Jones', 'dev', 100000) 


sue. giveRaise(.10) # Really runs onCall(self, .10) 
print (sue.pay) # Or giveRaise(self, .10) if -O 
#sue.giveRaise(1.10) 

#print(sue.pay) 


When run, valid calls in this code produce the following output (all the code in this 
section works the same under Python 2.6 and 3.0, because function decorators are 
supported in both, we’re not using attribute delegation, and we use 3.0-style print calls 
and exception construction syntax): 

C:\misc> C:\python30\python devtools_test.py 

True 

Bob Smith is 45 years old 


birthday = 5/31/1963 
110000 


Uncommenting any of the invalid calls causes a TypeError to be raised by the decorator. 
Here’s the result when the last two lines are allowed to run (as usual, I’ve omitted some 
of the error message text here to save space): 

C:\misc> C:\python30\python devtools_test.py 

True 

Bob Smith is 45 years old 

birthday = 5/31/1963 

110000 

TypeError: Argument 1 not in 0.0..1.0 


Running Python with its -0 flag at a system command line will disable range testing, 
but also avoid the performance overhead of the wrapping layer—we wind up calling 
the original undecorated function directly. Assuming this is a debugging tool only, you 
can use this flag to optimize your program for production use: 

C:\misc> C:\python30\python -0 devtools_test.py 

False 

Bob Smith is 45 years old 

birthday = 5/31/1963 

110000 

231000 


Generalizing for Keywords and Defaults, Too 


The prior version illustrates the basics we need to employ, but it’s fairly limited—it 
supports validating arguments passed by position only, and it does not validate key- 
word arguments (in fact, it assumes that no keywords are passed in a way that makes 
argument position numbers incorrect). Additionally, it does nothing about arguments 
with defaults that may be omitted in a given call. That’s fine if all your arguments are 
passed by position and never defaulted, but less than ideal in a general tool. Python 
supports much more flexible argument-passing modes, which we’re not yet addressing. 
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The mutation of our example shown next does better. By matching the wrapped func- 
tion’s expected arguments against the actual arguments passed in a call, it supports 
range validations for arguments passed by either position or keyword name, and it skips 
testing for default arguments omitted in the call. In short, arguments to be validated 
are specified by keyword arguments to the decorator, which later steps through both 
the *pargs positionals tuple and the **kargs keywords dictionary to validate. 


won 


File devtools.py: function decorator that performs range-test 
validation for passed arguments. Arguments are specified by 
keyword to the decorator. In the actual call, arguments may 
be passed by position or keyword, and defaults may be omitted. 
See devtools test.py for example use cases. 


won 


trace = True 


def rangetest (**argchecks): # Validate ranges for both+defaults 
def onDecorator(func): # onCall remembers func and argchecks 
if not _ debug_: # True if "python -O main.py args..." 
return func # Wrap if debugging; else use original 
else: 
import sys 
code = func. _ code __ 


allargs = code.co_varnames[:code.co_argcount ] 
funcname = func. name_ 


def onCall(*pargs, **kargs): 
# All pargs match first N expected args by position 
# The rest must be in kargs or be omitted defaults 
positionals = list(allargs) 
positionals = positionals[:len(pargs) ] 


for (argname, (low, high)) in argchecks.items(): 
# For all args to be checked 
if argname in kargs: 
# Was passed by name 
if kargs[argname] < low or kargs[argname] > high: 
errmsg = '{0} argument "{1}" not in {2}..{3}' 
errmsg = errmsg. format(funcname, argname, low, high) 
raise TypeError(errmsg) 


elif argname in positionals: 
# Was passed by position 
position = positionals.index(argname) 
if pargs[position] < low or pargs[position] > high: 
errmsg = '{0} argument "{1}" not in {2}..{3}' 
errmsg = errmsg.format(funcname, argname, low, high) 
raise TypeError(errmsg) 
else: 
# Assume not passed: default 
if trace: 
print(‘Argument "{o}" defaulted’ .format(argname) ) 
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return func(*pargs, **kargs) # OK: run original call 
return onCall 
return onDecorator 


The following test script shows how the decorator is used—arguments to be validated 
are given by keyword decorator arguments, and at actual calls we can pass by name or 
position and omit arguments with defaults even if they are to be validated otherwise: 


# File devtools_test.py 
# Comment lines raise TypeError unless "python —O" used on shell command line 
from devtools import rangetest 


# Test functions, positional and keyword 


@rangetest(age=(0, 120)) # persinfo = rangetest(...)(persinfo) 
def persinfo(name, age): 
print('%s is %s years old' % (name, age)) 


@rangetest(M=(1, 12), D=(1, 31), Y=(0, 2009)) 
def birthday(M, D, Y): 
print('birthday = {0}/{1}/{2}'.format(M, D, Y)) 


persinfo('Bob', 40) 
persinfo(age=40, name='Bob' ) 
birthday(5, D=1, Y=1963) 
#persinfo('Bob', 150) 

# persinfo(age=150, name='Bob') 
#birthday(5, D=40, Y=1963) 


# Test methods, positional and keyword 


class Person: 

def init__(self, name, job, pay): 

self.job = job 

self.pay = pay 

# giveRaise = rangetest(...)(giveRaise) 

@rangetest(percent=(0.0, 1.0)) # percent passed by name or position 
def giveRaise(self, percent): 

self.pay = int(self.pay * (1 + percent)) 


bob = Person('Bob Smith', 'dev', 100000) 
sue = Person('Sue Jones', 'dev', 100000) 
bob. giveRaise(.10) 

sue. giveRaise(percent=.20) 
print(bob.pay, sue.pay) 
#bob.giveRaise(1.10) 
#bob.giveRaise(percent=1.20) 


# Test omitted defaults: skipped 


@rangetest(a=(1, 10), b=(1, 10), c=(1, 10), d=(1, 10)) 
def omitargs(a, b=7, c=8, d=9): 
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print(a, b, c, d) 


omitargs(1, 2, 3, 4) 
omitargs(1, 2, 3) 
omitargs(1, 2, 3, d=4) 
omitargs(1, d=4) 
omitargs(d=4, a=1) 
omitargs(1, b=2, d=4) 
omitargs(d=8, c=7, a=1) 


#omitargs(1, 2, 3, 11) # Badd 
#omitargs(1, 2, 11) # Badc 
#omitargs(1, 2, 3, d=11) # Badd 
#omitargs(11, d=4) # Bada 
#omitargs(d=4, a=11) # Bada 
#omitargs(1, b=11, d=4) # Badb 


#omitargs(d=8, c=7, a=11) # Bada 


When this script is run, out-of-range arguments raise an exception as before, but ar- 
guments may be passed by either name or position, and omitted defaults are not vali- 
dated. This code runs on both 2.6 and 3.0, but extra tuple parentheses print in 2.6. 
Trace its output and test this further on your own to experiment; it works as before, 
but its scope has been broadened: 


C:\misc> C:\python30\python devtools_test.py 
Bob is 40 years old 
Bob is 40 years old 
birthday = 5/1/1963 
110000 120000 

1234 

Argument "d" defaulted 
1239 

1234 

Argument "c" defaulted 
Argument "b" defaulted 
1784 

Argument "c" defaulted 
Argument "b" defaulted 
1784 

Argument "c" defaulted 
1284 

Argument "b" defaulted 
1778 


On validation errors, we get an exception as before (unless the -0 command-line argu- 
ment is passed to Python) when one of the method test lines is uncommented: 


TypeError: giveRaise argument "percent" not in 0.0..1.0 


Implementation Details 


This decorator’s code relies on both introspection APIs and subtle constraints of ar- 
gument passing. To be fully general we could in principle try to mimic Python’s argu- 
ment matching logic in its entirety to see which names have been passed in which 
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modes, but that’s far too much complexity for our tool. It would be better if we could 
somehow match arguments passed by name against the set of all expected arguments’ 
names, in order to determine which position arguments actually appear in during a 
given call. 


Function introspection 


It turns out that the introspection API available on function objects and their associated 
code objects has exactly the tool we need. This API was briefly introduced in Chap- 
ter 19, but we’ll actually put it to use here. The set of expected argument names is 
simply the first N variable names attached to a function’s code object: 


# In Python 3.0 (and 2.6 for compatibility): 
>>> def func(a, b, c, d): 
1 


x = 
yr2 
>>> code = func. _code__ # Code object of function object 
>>> code.co_nlocals 
6 
>>> code.co_varnames # All local var names 
(‘a', 'b', 'c', 'd', 'x', 'y') 
>>> code.co_varnames[ :code.co_argcount] # First N locals are expected args 


(‘a', 'b', Ne 'd') 


>>> import sys # For backward compatibility 
>>> sys.version_info # [0] is major release number 
(3, 0, 0, 'final', 0) 

>>> code = func.__code_ if sys.version_info[0] == 3 else func. func_code 


The same API is available in older Pythons, but the func. code __ attribute is spelled 
as func.func_code in 2.5 and earlier (the newer __code__ attribute is also redundantly 
available in 2.6 for portability). Run a dir call on function and code objects for more 
details. 


Argument assumptions 


Given this set of expected argument names, the solution relies on two constraints on 
argument passing order imposed by Python (these still hold true in both 2.6 and 3.0): 


e At the call, all positional arguments appear before all keyword arguments. 


e In the def, all nondefault arguments appear before all default arguments. 


That is, anonkeyword argument cannot generally follow a keyword argument at a call, 
and a nondefault argument cannot follow a default argument at a definition. All 
“name=value” syntax must appear after any simple “name” in both places. 


To simplify our work, we can also make the assumption that a call is valid in general— 
i.e., that all arguments either will receive values (by name or position), or will be omitted 
intentionally to pick up defaults. This assumption won’t necessarily hold, because the 
function has not yet actually been called when the wrapper logic tests validity—the call 


Example: Validating Function Arguments | 1041 


may still fail later when invoked by the wrapper layer, due to incorrect argument pass- 
ing. As long as that doesn’t cause the wrapper to fail any more badly, though, we can 
finesse the validity of the call. This helps, because validating calls before they are ac- 
tually made would require us to emulate Python’s argument-matching algorithm in 
full—again, too complex a procedure for our tool. 


Matching algorithm 


Now, given these constraints and assumptions, we can allow for both keywords and 
omitted default arguments in the call with this algorithm. When a call is intercepted, 
we can make the following assumptions: 


° All N passed positional arguments in *pargs must match the first N expected ar- 
guments obtained from the function’s code object. This is true per Python’s call 
ordering rules, outlined earlier, since all positionals precede all keywords. 


e To obtain the names of arguments actually passed by position, we can slice the list 
of all expected arguments up to the length N of the *pargs positionals tuple. 


e Any arguments after the first N expected arguments either were passed by keyword 
or were defaulted by omission at the call. 


e For each argument name to be validated, if it is in **kargs it was passed by name, 
and if it is in the first N expected arguments it was passed by position (in which 
case its relative position in the expected list gives its relative position in *pargs); 
otherwise, we can assume it was omitted in the call and defaulted and need not be 
checked. 


In other words, we can skip tests for arguments that were omitted in a call by assuming 
that the first N actually passed positional arguments in *pargs must match the first N 
argument names in the list of all expected arguments, and that any others must either 
have been passed by keyword and thus be in **kargs, or have been defaulted. Under 
this scheme, the decorator will simply skip any argument to be checked that was omit- 
ted between the rightmost positional argument and the leftmost keyword argument, 
between keyword arguments, or after the rightmost positional in general. Trace 
through the decorator and its test script to see how this is realized in code. 


Open Issues 


Although our range-testing tool works as planned, two caveats remain. First, as men- 
tioned earlier, calls to the original function that are not valid still fail in our final dec- 
orator. The following both trigger exceptions, for example: 

omitargs() 

omitargs(d=8, c=7, b=6) 
These only fail, though, where we try to invoke the original function, at the end of the 
wrapper. While we could try to imitate Python’s argument matching to avoid this, 
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there’s not much reason to do so—since the call would fail at this point anyhow, we 
might as well let Python’s own argument-matching logic detect the problem for us. 


Lastly, although our final version handles positional arguments, keyword arguments, 
and omitted defaults, it still doesn’t do anything explicit about *args and **args that 
may be used in a decorated function that accepts arbitrarily many arguments. We 
probably don’t need to care for our purposes, though: 


e Ifan extra keyword argument is passed, its name will show up in **kargs and can 
be tested normally if mentioned to the decorator. 


e Ifan extra keyword argument is not passed, its name won’t be in either **kargs or 
the sliced expected positionals list, and it will thus not be checked—it is treated as 
though it were defaulted, even though it is really an optional extra argument. 


e Ifan extra positional argument is passed, there’s no way to reference it in the dec- 
orator anyhow—its name won’t be in either **kargs or the sliced expected argu- 
ments list, so it will simply be skipped. Because such arguments are not listed in 
the function’s definition, there’s no way to map a name given to the decorator back 
to an expected relative position. 


In other words, as it is the code supports testing arbitrary keyword arguments by name, 
but not arbitrary positionals that are unnamed and hence have no set position in the 
function’s argument signature. 


In principle, we could extend the decorator’s interface to support *args in the decorated 
function, too, for the rare cases where this might be useful (e.g., a special argument 
name with a test to apply to all arguments in the wrapper’s *pargs beyond the length 
of the expected arguments list). Since we’ve already exhausted the space allocation for 
this example, though, if you care about such improvements you've officially crossed 
over into the realm of suggested exercises. 


Decorator Arguments Versus Function Annotations 


Interestingly, the function annotation feature introduced in Python 3.0 could provide 
an alternative to the decorator arguments used by our example to specify range tests. 
As we learned in Chapter 19, annotations allow us to associate expressions with argu- 
ments and return values, by coding them in the def header line itself; Python collects 
annotations in a dictionary and attaches it to the annotated function. 


We could use this in our example to code range limits in the header line, instead of in 
decorator arguments. We would still need a function decorator to wrap the function 
in order to intercept later calls, but we would essentially trade decorator argument 
syntax: 

@rangetest(a=(1, 5), c=(0.0, 1.0)) 


def func(a, b, c): # func = rangetest(...) (func) 
print(a + b + c) 


Example: Validating Function Arguments | 1043 


for annotation syntax like this: 


@rangetest 
def func(a:(1, 5), b, c:(0.0, 1.0)): 
print(a + b + c) 


That is, the range constraints would be moved into the function itself, instead of being 
coded externally. The following script illustrates the structure of the resulting decora- 
tors under both schemes, in incomplete skeleton code. The decorator arguments code 
pattern is that of our complete solution shown earlier; the annotation alternative re- 
quires one less level of nesting, because it doesn’t need to retain decorator arguments: 


# Using decorator arguments 


def rangetest (**argchecks): 
def onDecorator(func): 
def onCall(*pargs, **kargs): 
print (argchecks) 
for check in argchecks: pass # Add validation code here 
return func(*pargs, **kargs) 
return onCall 
return onDecorator 


@rangetest(a=(1, 5), c=(0.0, 1.0)) 
def func(a, b, c): # func = rangetest(...) (func) 
print(a + b + c) 


func(1, 2, c=3) # Runs onCall, argchecks in scope 


# Using function annotations 


def rangetest (func): 
def onCall(*pargs, **kargs): 
argchecks = func. annotations _ 
print (argchecks) 
for check in argchecks: pass # Add validation code here 
return func(*pargs, **kargs) 
return onCall 


@rangetest 
def func(a:(1, 5), b, c:(0.0, 1.0)): # func = rangetest(func) 
print(a + b + c) 


func(1, 2, c=3) # Runs onCall, annotations on func 


When run, both schemes have access to the same validation test information, but in 
different forms—the decorator argument version’s information is retained in an argu- 
ment in an enclosing scope, and the annotation version’s information is retained in an 
attribute of the function itself: 


{'a': (4, 5), 'c': (0.0, 1.0)} 


6 
{'a': (4, 5), 'c': (0.0, 1.0)} 
6 
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Pll leave fleshing out the rest of the annotation-based version as a suggested exercise; 
its code would be identical to that of our complete solution shown earlier, because 
range-test information is simply on the function instead of in an enclosing scope. Really, 
all this buys us is a different user interface for our tool—it will still need to match 
argument names against expected argument names to obtain relative positions as 
before. 


In fact, using annotation instead of decorator arguments in this example actually limits 
its utility. For one thing, annotation only works under Python 3.0, so 2.6 is no longer 
supported; function decorators with arguments, on the other hand, work in both 
versions. 


More importantly, by moving the validation specifications into the def header, we es- 
sentially commit the function to a single role—since annotation allows us to code only 
one expression per argument, it can have only one purpose. For instance, we cannot 
use range-test annotations for any other role. 


By contrast, because decorator arguments are coded outside the function itself, they 
are both easier to remove and more general—the code of the function itself does not 
imply a single decoration purpose. In fact, by nesting decorators with arguments, we 
can apply multiple augmentation steps to the same function; annotation directly sup- 
ports only one. With decorator arguments, the function itself also retains a simpler, 
normal appearance. 


Still, if you have a single purpose in mind, and you can commit to supporting 3.X only, 
the choice between annotation and decorator arguments is largely stylistic and subjec- 
tive. As is so often true in life, one person’s annotation may well be another’s syntactic 
clutter... 


Other Applications: Type Testing (If You Insist!) 


The coding pattern we’ve arrived at for processing arguments in decorators could be 
applied in other contexts. Checking argument data types at development time, for ex- 
ample, is a straightforward extension: 


def typetest(**argchecks): 
def onDecorator(func): 


def onCall(*pargs, **kargs): 
positionals = list(allargs)[:len(pargs) ] 
for (argname, type) in argchecks.items(): 
if argname in kargs: 
if not isinstance(kargs[argname], type): 


raise TypeError(errmsg) 
elif argname in positionals: 
position = positionals.index(argname) 
if not isinstance(pargs[position], type): 


raise TypeError(errmsg) 
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else: 
# Assume not passed: default 
return func(*pargs, **kargs) 
return onCall 
return onDecorator 


@typetest(a=int, c=float) 


def func(a, b, c, d): # func = typetest(...) (func) 
func(1, 2, 3.0, 4) # Okay 
func('spam', 2, 99, 4) # Triggers exception correctly 


In fact, we might even generalize further by passing in a test function, much as we did 
to add Public decorations earlier; a single copy of this sort of code would suffice for 
both range and type testing. Using function annotations instead of decorator arguments 
for such a decorator, as described in the prior section, would make this look even more 
like type declarations in other languages: 

@typetest 

def func(a: int, b, c: float, d): # func = typetest(func) 

see # Gasp!... 

As you should have learned in this book, though, this particular role is generally a bad 
idea in working code, and not at all Pythonic (in fact, it’s often a symptom of an 
ex-C++ programmer’s first attempts to use Python). 


Type testing restricts your function to work on specific types only, instead of allowing 
it to operate on any types with compatible interfaces. In effect, it limits your code and 
breaks its flexibility. On the other hand, every rule has exceptions; type checking may 
come in handy in isolated cases while debugging and when interfacing with code writ- 
ten in more restrictive languages, such as C++. This general pattern of argument pro- 
cessing might also be applicable in a variety of less controversial roles. 


Chapter Summary 


In this chapter, we explored decorators—both the function and class varieties. As we 
learned, decorators are a way to insert code to be run automatically when a function 
or class is defined. When a decorator is used, Python rebinds a function or class name 
to the callable object it returns. This hook allows us to add a layer of wrapper logic to 
function calls and class instance creation calls, in order to manage functions and in- 
stances. As we also saw, manager functions and manual name rebinding can achieve 
the same effect, but decorators provide a more explicit and uniform solution. 


As we'll see in the next chapter, class decorators can also be used to manage classes 
themselves, rather than just their instances. Because this functionality overlaps with 
metaclasses, the topic of the next chapter, you’ll have to read ahead for the rest of this 
story. First, though, work through the following quiz. Because this chapter was mostly 
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focused on its larger examples, its quiz will ask you to modify some of its code in order 
to review. 


Test Your Knowledge: Quiz 


1. As mentioned in one of this chapter’s Notes, the timer function decorator with 
decorator arguments that we wrote in the section “Adding Decorator Argu- 
ments” on page 1008 can be applied only to simple functions, because it uses a 
nested class with a _call__ operator overloading method to catch calls. This 
structure does not work for class methods because the decorator instance is passed 
to self, not the subject class instance. Rewrite this decorator so that it can be 
applied to both simple functions and class methods, and test it on both functions 
and methods. (Hint: see the section “Class Blunders I: Decorating Class Meth- 
ods” on page 1001 for pointers.) Note that you may make use of assigning function 
object attributes to keep track of total time, since you won’t have a nested class for 
state retention and can’t access nonlocals from outside the decorator code. 


2. The Public/Private class decorators we wrote in this chapter will add overhead to 
every attribute fetch in a decorated class. Although we could simply delete the @ 
decoration line to gain speed, we could also augment the decorator itself to check 
the _debug__ switch and perform no wrapping at all when the -0 Python flag is 
passed on the command line (just as we did for the argument range-test decorators). 
That way, we can speed our program without changing its source, via command- 
line arguments (python -0 main.py...). Code and test this extension. 


Test Your Knowledge: Answers 


1. Here’s one way to code the first question’s solution, and its output (albeit with 
class methods that run too fast to time). The trick lies in replacing nested classes 
with nested functions, so the self argument is not the decorator’s instance, and 
assigning the total time to the decorator function itself so it can be fetched later 
through the original rebound name (see the section “State Information Retention 
Options” on page 997 of this chapter for details—functions support arbitrary at- 
tribute attachment, and the function name is an enclosing scope reference in this 
context). 


import time 


def timer(label='', trace=True): # On decorator args: retain args 
def onDecorator(func): # On @: retain decorated func 
def onCall(*args, **kargs): # On calls: call original 
start = time.clock() # State is scopes + func attr 


result = func(*args, **kargs) 
elapsed = time.clock() - start 


Test Your Knowledge: Answers | 1047 


onCall.alltime += elapsed 
if trace: 
format = '%s%s: %.5f, %.5F' 
values = (label, func. name_, elapsed, onCall.alltime) 
print(format % values) 
return result 
onCall.alltime = 0 
return onCall 
return onDecorator 


# Test on functions 


@timer(trace=True, label='[CCC]==>') 
def listcomp(N): # Like listcomp = timer‘(...)(listcomp) 
return [x * 2 for x in range(N)] # listcomp(...) triggers onCall 


@timer(trace=True, label='[MMM]==>') 
def mapcall(N): 
return list(map((lambda x: x * 2), range(N)))  # list() for 3.0 views 


for func in (listcomp, mapcall): 
result = func(5) # Time for this call, all calls, return value 
func (5000000) 
print(result) 
print(‘allTime = %s\n' % func.alltime)  # Total time for all calls 


# Test on methods 


class Person: 
def _ init__(self, name, pay): 
self.name = name 
self.pay = pay 


@timer() 
def giveRaise(self, percent): # giveRaise = timer()(giveRaise) 
self.pay *= (1.0 + percent) # tracer remembers giveRaise 


@timer (label='**') 
def lastName(self): # lastName = timer...) (lastName) 
return self.name.split()[-1] # alltime per class, not instance 


bob = Person('Bob Smith', 50000) 
sue = Person('Sue Jones', 100000) 
bob. giveRaise(.10) 


sue. giveRaise(.20) # runs onCall(sue, .10) 
print(bob.pay, sue.pay) 
print(bob.lastName(), sue.lastName()) # runs onCall(bob), remembers lastName 


print('%.5f %.5f' % (Person.giveRaise.alltime, Person. lastName.alltime) ) 
# Expected output 


[CCC]==>listcomp: 0.00002, 0.00002 
[CCC]==>listcomp: 1.19636, 1.19638 
[0, 2, 4, 6, 8] 

allTime = 1.19637775192 
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[MMM]==>mapcall: 0.00002, 0.00002 
[MMM]==>mapcall: 2.29260, 2.29262 
[0, 2, 4, 6, 8] 

allTime = 2.2926232943 


giveRaise: 0.00001, 0.00001 

giveRaise: 0.00001, 0.00002 

55000.0 120000.0 

**lastName: 0.00001, 0.00001 
**lastName: 0.00001, 0.00002 
Smith Jones 

0.00002 0.00002 


2. The following satisfies the second question—it’s been augmented to return the 
original class in optimized mode (-0), so attribute accesses don’t incur a speed hit. 
Really, all I did was add the debug mode test statements and indent the class further 
to the right. Add operator overloading method redefinitions to the wrapper class 
if you want to support delegation of these to the subject class in 3.0, too (2.6 routes 
these through __ getattr_, but 3.0 and new-style classes in 2.6 do not). 


traceMe = False 
def trace(*args): 
if traceMe: print('[' + 


'.join(map(str, args)) + ']') 


def accessControl(faillf): 
def onDecorator(aClass): 
if not _ debug_: 
return aClass 
else: 
class onInstance: 
def init__(self, *args, **kargs): 
self. wrapped = aClass(*args, **kargs) 
def _ getattr_(self, attr): 
trace('get:', attr) 
if failIf(attr): 
raise TypeError('private attribute fetch: ' + attr) 
else: 
return getattr(self. wrapped, attr) 
def _setattr_(self, attr, value): 
trace('set:', attr, value) 
if attr == ' onInstance_wrapped': 
self. dict__[attr] = value 
elif failIf(attr): 
raise TypeError('private attribute change: ' + attr) 
else: 
setattr(self. wrapped, attr, value) 
return onInstance 
return onDecorator 


def Private(*attributes): 
return accessControl(failIf=(lambda attr: attr in attributes) ) 


def Public(*attributes): 
return accessControl(failIf=(lambda attr: attr not in attributes)) 
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# Test code: split me off to another file to reuse decorator 


@Private('age') # Person = Private('age') (Person) 
class Person: # Person = onInstance with state 
def _ init__(self, name, age): 
self.name = name 
self.age = age # Inside accesses run normally 


X = Person('Bob', 40) 

print (X.name) # Outside accesses validated 
X.name = 'Sue' 

print (X.name) 

#print(X.age) # FAILS unles "python -O" 

#X.age = 999 # ditto 

#print(X.age) # ditto 


@Public('name' ) 
class Person: 
def _ init__(self, name, age): 
self.name = name 
self.age = age 


X = Person('bob', 40) # X is an onInstance 

print (X.name) # onInstance embeds Person 
X.name = 'Sue' 

print (X.name) 

#print(X.age) # FAILS unless "python -O main.py" 

#X.age = 999 # ditto 

#print(X.age) # ditto 
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CHAPTER 39 
Metaclasses 


In the prior chapter, we explored decorators and studied various examples of their use. 
In this final chapter of the book, we’re going continue our tool-builders focus and 
investigate another advanced topic: metaclasses. 


In a sense, metaclasses simply extend the code-insertion model of decorators. As we 
learned in the prior chapter, function and class decorators allow us to intercept and 
augment function calls and class instance creation calls. In a similar sprit, metaclasses 
allow us to intercept and augment class creation—they provide an API for inserting 
extra logic to be run at the conclusion of a class statement, albeit in different ways than 
decorators. As such, they provide a general protocol for managing class objects in a 
program. 


Like all the subjects dealt with in this part of the book, this is an advanced topic that 
can be investigated on an as-needed basis. In practice, metaclasses allow us to gain a 
high level of control over how a set of classes work. This is a powerful concept, and 
metaclasses are not intended for most application programmers (nor, frankly, the faint 
of heart!). 


On the other hand, metaclasses open the door to a variety of coding patterns that may 
be difficult or impossible to achieve otherwise, and they are especially of interest to 
programmers seeking to write flexible APIs or programming tools for others to use. 
Even if you don’t fall into that category, metaclasses can teach you much about Python’s 
class model in general. 


Asin the prior chapter, part of our goal here is also to show more realistic code examples 
than we did earlier in this book. Although metaclasses are a core language topic and 
not themselves an application domain, part of this chapter’s goal is to spark your in- 
terest in exploring larger application-programming examples after you finish this book. 
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To Metaclass or Not to Metaclass 


Metaclasses are perhaps the most advanced topic in this book, if not the Python lan- 
guage as a whole. To borrow a quote from the comp.lang.python newsgroup by veteran 
Python core developer Tim Peters (who is also the author of the famous “import this” 
Python motto): 


[Metaclasses] are deeper magic than 99% of users should ever worry about. If you wonder 
whether you need them, you don’t (the people who actually need them know with cer- 
tainty that they need them, and don’t need an explanation about why). 


In other words, metaclasses are primarily intended for programmers building APIs and 
tools for others to use. In many (if not most) cases, they are probably not the best choice 
in applications work. This is especially true if you’re developing code that other people 
will use in the future. Coding something “because it seems cool” is not generally a 
reasonable justification, unless you are experimenting or learning. 


Still, metaclasses have a wide variety of potential roles, and it’s important to know when 
they can be useful. For example, they can be used to enhance classes with features like 
tracing, object persistence, exception logging, and more. They can also be used to con- 
struct portions of a class at runtime based upon configuration files, apply function 
decorators to every method of a class generically, verify conformance to expected 
interfaces, and so on. 


In their more grandiose incarnations, metaclasses can even be used to implement al- 
ternative coding patterns such as aspect-oriented programming, object/relational map- 
pers (ORMs) for databases, and more. Although there are often alternative ways to 
achieve such results (as we’ll see, the roles of class decorators and metaclasses often 
intersect), metaclasses provide a formal model tailored to those tasks. We don’t have 
space to explore all such applications first-hand in this chapter but you should feel free 
to search the Web for additional use cases after studying the basics here. 


Probably the reason for studying metaclasses most relevant to this book is that this 
topic can help demystify Python’s class mechanics in general. Although you may or 
may not code or reuse them in your work, a cursory understanding of metaclasses can 
impart a deeper understanding of Python at large. 


Increasing Levels of Magic 


Most of this book has focused on straightforward application-coding techniques, as 
most programmers spend their time writing modules, functions, and classes to achieve 
real-world goals. They may use classes and make instances, and might even do a bit of 
operator overloading, but they probably won’t get too deep into the details of how their 
classes actually work. 
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However, in this book we’ve also seen a variety of tools that allow us to control Python’s 
behavior in generic ways, and that often have more to do with Python internals or tool 
building than with application-programming domains: 


Introspection attributes 
Special attributes like _class_ and __dict__ allow us to inspect internal imple- 
mentation aspects of Python objects, in order to process them generically—to list 
all attributes of an object, display a class’s name, and so on. 


Operator overloading methods 
Specially named methods such as__str__ and __add__ coded in classes intercept 
and provide behavior for built-in operations applied to class instances, such as 
printing, expression operators, and so on. They are run automatically in response 
to built-in operations and allow classes to conform to expected interfaces. 


Attribute interception methods 
A special category of operator overloading methods provide a way to intercept 
attribute accesses on instances generically: _ getattr_, _ setattr_, and 
__getattribute allow wrapper classes to insert automatically run code that may 
validate attribute requests and delegate them to embedded objects. They allow any 
number of attributes of an object—either selected attributes, or all of them—to be 
computed when accessed. 


Class properties 
The property built-in allows us to associate code with a specific class attribute that 
is automatically run when the attribute is fetched, assigned, or deleted. Though 
not as generic as the prior paragraph’s tools, properties allow for automatic code 
invocation on access to specific attributes. 


Class attribute descriptors 
Really, property is a succinct way to define an attribute descriptor that runs func- 
tions on access automatically. Descriptors allow us to code in a separate class 
__get_, set _,and_ delete handler methods that are run automatically when 
an attribute assigned to an instance of that class is accessed. They provide a general 
way to insert automatically run code when a specific attribute is accessed, and they 
are triggered after an attribute is looked up normally. 


Function and class decorators 

As we saw in Chapter 38, the special @callable syntax for decorators allows us to 
add logic to be automatically run when a function is called or a class instance is 
created. This wrapper logic can trace or time calls, validate arguments, manage all 
instances of a class, augment instances with extra behavior such as attribute fetch 
validation, and more. Decorator syntax inserts name-rebinding logic to be run at 
the end of function and class definition statements—decorated function and class 
names are rebound to callable objects that intercept later calls. 
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As mentioned in this chapter’s introduction, metaclasses are a continuation of this 
story—they allow us to insert logic to be run automatically when a class object is cre- 
ated, at the end of a class statement. This logic doesn’t rebind the class name to a 
decorator callable, but rather routes creation of the class itself to specialized logic. 


In other words, metaclasses are ultimately just another way to define automatically run 
code. Via metaclasses and the other tools just listed, Python provides ways for us to 
interject logic in a variety of contexts—at operator evaluation, attribute access, function 
calls, class instance creation, and now class object creation. 


Unlike class decorators, which usually add logic to be run at instance creation time, 
metaclasses run at class creation time; as such, they are hooks generally used for man- 
aging or augmenting classes, instead of their instances. 


For example, metaclasses can be used to add decoration to all methods of classes 
automatically, register all classes in use to an API, add user-interface logic to classes 
automatically, create or extend classes from simplified specifications in text files, and 
so on. Because we can control how classes are made (and by proxy the behavior their 
instances acquire), their applicability is potentially very wide. 


As we’ve also seen, many of these advanced Python tools have intersecting roles. For 
example, attributes can often be managed with properties, descriptors, or attribute 
interception methods. As we’ll see in this chapter, class decorators and metaclasses can 
often be used interchangeably as well. Although class decorators are often used to 
manage instances, they can be used to manage classes instead; similarly, while meta- 
classes are designed to augment class construction, they can often insert code to manage 
instances, too. Since the choice of which technique to use is sometimes purely subjec- 
tive, knowledge of the alternatives can help you pick the right tool for a given task. 


The Downside of “Helper” Functions 


Also like the decorators of the prior chapter, metaclasses are often optional, from a 
theoretical perspective. We can usually achieve the same effect by passing class objects 
through manager functions (sometimes known as “helper” functions), much as we can 
achieve the goals of decorators by passing functions and instances through manager 
code. Just like decorators, though, metaclasses: 


e Provide a more formal and explicit structure 


e Help ensure that application programmers won’t forget to augment their classes 
according to an API’s requirements 

e Avoid code redundancy and its associated maintenance costs by factoring class 
customization logic into a single location, the metaclass 


To illustrate, suppose we want to automatically insert a method into a set of classes. 
Of course, we could do this with simple inheritance, if the subject method is known 
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when we code the classes. In that case, we can simply code the method in a superclass 
and have all the classes in question inherit from it: 


class Extras: 
def extra(self, args): # Normal inheritance: too static 


class Client1(Extras): ... # Clients inherit extra methods 
class Client2(Extras): ... 
class Client3(Extras): ... 


X = Client1() # Make an instance 
X.extra() # Run the extra methods 


Sometimes, though, it’s impossible to predict such augmentation when classes are co- 
ded. Consider the case where classes are augmented in response to choices made in a 
user interface at runtime, or to specifications typed in a configuration file. Although 
we could code every class in our imaginary set to manually check these, too, it’s a lot 
to ask of clients (required is abstract here—it’s something to be filled in): 


def extra(self, arg): ... 


class Client1: ... # Client augments: too distributed 
if required(): 
Client1.extra = extra 


class Client2: ... 
if required(): 
Client2.extra = extra 


class Client3: ... 
if required(): 
Client3.extra = extra 


X = Client1() 
X.extra() 


We can add methods to a class after the class statement like this because a class method 
is just a function that is associated with a class and has a first argument to receive the 
self instance. Although this works, it puts all the burden of augmentation on client 
classes (and assumes they’ll remember to do this at all!). 


It would be better from a maintenance perspective to isolate the choice logic in a single 
place. We might encapsulate some of this extra work by routing classes though a 
manager function—such a manager function would extend the class as required and 
handle all the work of runtime testing and configuration: 


def extra(self, arg): ... 
def extras(Class): # Manager function: too manual 
if required(): 
Class.extra = extra 


class Clienti: ... 
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extras(Client1) 


class Client2: ... 
extras(Client2) 


class Client3: ... 
extras (Client3) 


X = Client1() 
X.extra() 


This code runs the class through a manager function immediately after it is created. 
Although manager functions like this one can achieve our goal here, they still put a 
fairly heavy burden on class coders, who must understand the requirements and adhere 
to them in their code. It would be better if there were a simple way to enforce the 
augmentation in the subject classes, so that they don’t need to deal with and can’t forget 
to use the augmentation. In other words, we’d like to be able to insert some code to 
run automatically at the end of a class statement, to augment the class. 


This is exactly what metaclasses do—by declaring a metaclass, we tell Python to route 
the creation of the class object to another class we provide: 


def extra(self, arg): ... 


class Extras(type): 
def _init_ (Class, classname, superclasses, attributedict): 
if required(): 
Class.extra = extra 


class Client1(metaclass=Extras): ... # Metaclass declaration only 
class Client2(metaclass=Extras): ... # Client class is instance of meta 
class Client3(metaclass=Extras): ... 


X = Client1() # X is instance of Client1 
X.extra() 


Because Python invokes the metaclass automatically at the end of the class statement 
when the new class is created, it can augment, register, or otherwise manage the class 
as needed. Moreover, the only requirement for the client classes is that they declare the 
metaclass; every class that does so will automatically acquire whatever augmentation 
the metaclass provides, both now and in the future if the metaclass changes. Although 
it may be difficult to see in this small example, metaclasses generally handle such tasks 
better than other approaches. 


Metaclasses Versus Class Decorators: Round 1 


Having said that, it’s also interesting to note that the class decorators described in the 
preceding chapter sometimes overlap with metaclasses in terms of functionality. Al- 
though they are typically used for managing or augmenting instances, class decorators 
can also augment classes, independent of any created instances. 
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For example, suppose we coded our manager function to return the augmented class, 
instead of simply modifying it in-place. This would allow a greater degree of flexibility, 
because the manager would be free to return any type of object that implements the 
class’s expected interface: 


def extra(self, arg): ... 


def extras(Class): 
if required(): 
Class.extra = extra 
return Class 


class Client1: ... 
Client1 = extras(Client1) 


class Client2: ... 
Client2 = extras(Client2) 


class Client3: ... 
Client3 = extras(Client3) 


X = Client1() 
X.extra() 


If you think this is starting to look reminiscent of class decorators, you're right. In the 
prior chapter we presented class decorators as a tool for augmenting instance creation 
calls. Because they work by automatically rebinding a class name to the result of a 
function, though, there’s no reason that we can’t use them to augment the class before 
any instances are ever created. That is, class decorators can apply extra logic to 
classes, not just instances, at creation time: 


def extra(self, arg): ... 


def extras(Class): 
if required(): 
Class.extra = extra 
return Class 


@extras 

class Clienti: ... # Client1 = extras(Client1) 

@extras 

class Client2: ... # Rebinds class independent of instances 
@extras 


class Client3: ... 


X = Client1() # Makes instance of augmented class 
X.extra() # X is instance of original Client1 


Decorators essentially automate the prior example’s manual name rebinding here. Just 
like with metaclasses, because the decorator returns the original class, instances are 
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made from it, not from a wrapper object. In fact, instance creation is not intercepted 
at all. 


In this specific case—adding methods to a class when it’s created—the choice between 
metaclasses and decorators is somewhat arbitrary. Decorators can be used to manage 
both instances and classes, and they intersect with metaclasses in the second of these 
roles. 


However, this really addresses only one operational mode of metaclasses. As we'll see, 
decorators correspond to metaclass__init__ methodsin this role, but metaclasses have 
additional customization hooks. As we’ll also see, in addition to class initialization, 
metaclasses can perform arbitrary construction tasks that might be more difficult with 
decorators. 


Moreover, although decorators can manage both instances and classes, the converse is 
not as direct—metaclasses are designed to manage classes, and applying them to man- 
aging instances is less straightforward. We’ll explore this difference in code later in this 
chapter. 


Much of this section’s code has been abstract, but we’ll flesh it out into a real working 
example later in this chapter. To fully understand how metaclasses work, though, we 
first need to get a clearer picture of their underlying model. 


The Metaclass Model 


To really understand how metaclasses do their work, you need to understand a bit more 
about Python’s type model and what happens at the end of a class statement. 


Classes Are Instances of type 


So far in this book, we’ve done most of our work by making instances of built-in types 
like lists and strings, as well as instances of classes we code ourselves. As we’ve seen, 
instances of classes have some state information attributes of their own, but they also 
inherit behavioral attributes from the classes from which they are made. The same holds 
true for built-in types; list instances, for example, have values of their own, but they 
inherit methods from the list type. 


While we can get a lot done with such instance objects, Python’s type model turns out 
to be a bit richer than I’ve formally described. Really, there’s a hole in the model we’ve 
seen thus far: if instances are created from classes, what is it that creates our classes? It 
turns out that classes are instances of something, too: 


e In Python 3.0, user-defined class objects are instances of the object named type, 
which is itself a class. 


e In Python 2.6, new-style classes inherit from object, which is a subclass of type; 
classic classes are instances of type and are not created from a class. 
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We explored the notion of types in Chapter 9 and the relationship of classes to types 
in Chapter 31, but let’s review the basics here so we can see how they apply to 
metaclasses. 


Recall that the type built-in returns the type of any object (which is itself an object). 
For built-in types like lists, the type of the instance is the built-in list type, but the type 
of the list type is the type type itself—the type object at the top of the hierarchy creates 
specific types, and specific types create instances. You can see this for yourself at the 
interactive prompt. In Python 3.0, for example: 


C:\misc> c:\python30\python 


>>> type([]) # In 3.0 list is instance of list type 
<class 'list'> 

>>> type(type([])) # Type of list is type class 

<class 'type'> 

>>> type(list) # Same, but with type names 

<class 'type'> 

>>> type(type) # Type of type is type: top of hierarchy 


<class 'type'> 


As we learned when studying new-style class changes in Chapter 31, the same is gen- 
erally true in Python 2.6 (and older), but types are not quite the same as classes— 
type is a unique kind of built-in object that caps the type hierarchy and is used to 
construct types: 

C:\misc> c:\python26\python 

>>> type([]) # In 2.6, type is a bit different 

<type 'list'> 

>>> type(type([])) 

<type 'type'> 


>>> type(list) 

<type 'type'> 

>>> type(type) 

<type 'type'> 
It turns out that the type/instance relationship holds true for classes as well: instances 
are created from classes, and classes are created from type. In Python 3.0, though, the 
notion of a “type” is merged with the notion of a “class.” In fact, the two are essentially 
synonyms—classes are types, and types are classes. That is: 


e Types are defined by classes that derive from type. 

e User-defined classes are instances of type classes. 

e User-defined classes are types that generate instances of their own. 
As we saw earlier, this equivalence effects code that tests the type of instances: the type 
of an instance is the class from which it was generated. It also has implications for the 


way that classes are created that turn out to be the key to this chapter’s subject. Because 
classes are normally created from a root type class by default, most programmers don’t 
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need to think about this type/class equivalence. However, it opens up new possibilities 
for customizing both classes and their instances. 


For example, classes in 3.0 (and new-style classes in 2.6) are instances of the type class, 
and instance objects are instances of their classes; in fact, classes now have a 
__class _ that links to type, just as an instance has a __class__ that links to the class 
from which it was made: 


C:\misc> c:\python30\python 


>>> class C: pass # 3.0 class object (new-style) 
>>> X = C() # Class instance object 

>>> type(X) # Instance is instance of class 
<class '_ main_.C'> 

>>> X.__class__ # Instance's class 

<class '_ main_.C'> 

>>> type(C) # Class is instance of type 
<class 'type'> 

>>> C.__class__ # Class's class is type 


<class 'type'> 


Notice especially the last two lines here—classes are instances of the type class, just as 
normal instances are instances of a class. This works the same for both built-ins and 
user-defined class types in 3.0. In fact, classes are not really a separate concept at all: 
they are simply user-defined types, and type itself is defined by a class. 


In Python 2.6, things work similarly for new-style classes derived from object, because 
this enables 3.0 class behavior: 


C:\misc> c:\python26\python 

>>> class C(object): pass # In 2.6 new-style classes, 
aks # classes have a class too 
>>> X = C() 

>>> type(X) 

<class '_ main_.C'> 

>>> type(C) 

<type 'type'> 


>>> X.__ class __ 

<class '_ main_.C'> 

>>> C._ class __ 

<type 'type'> 
Classic classes in 2.6 are a bit different, though—because they reflect the class model 
in older Python releases, they do not have a ___class__ link, and like built-in types in 
2.6 they are instances of type, not a type class: 

C:\misc> c:\python26\python 

>>> class C: pass # In 2.6 classic classes, 


Se # classes have no class themselves 
>>> X = C() 
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>>> type(X) 
<type 'instance'> 
>>> type(C) 
<type 'classobj'> 


>>> X.__ class __ 

<class _ main_.C at 0x005F85A0> 

>>> C._ class __ 

AttributeError: class C has no attribute '_ class_' 


Metaclasses Are Subclasses of Type 


So why do we care that classes are instances of a type class in 3.0? It turns out that this 
is the hook that allows us to code metaclasses. Because the notion of type is the same 
as class today, we can subclass type with normal object-oriented techniques and class 
syntax to customize it. And because classes are really instances of the type class, 
creating classes from customized subclasses of type allows us to implement custom 
kinds of classes. In full detail, this all works out quite naturally—in 3.0, and in 2.6 new- 
style classes: 


e type is a class that generates user-defined classes. 
e Metaclasses are subclasses of the type class. 
e Class objects are instances of the type class, or a subclass thereof. 


e Instance objects are generated from a class. 


In other words, to control the way classes are created and augment their behavior, all 
we need to do is specify that a user-defined class be created from a user-defined meta- 
class instead of the normal type class. 


Notice that this type instance relationship is not quite the same as inheritance: user- 
defined classes may also have superclasses from which they and their instances inherit 
attributes (inheritance superclasses are listed in parentheses in the class statement and 
show up ina class’s __bases__ tuple). The type from which a class is created, and of 
which it is an instance, is a different relationship. The next section describes the pro- 
cedure Python follows to implement this instance-of type relationship. 


Class Statement Protocol 


Subclassing the type class to customize it is really only half of the magic behind meta- 
classes. We still need to somehow route a class’s creation to the metaclass, instead of 
the default type. To fully understand how this is arranged, we also need to know how 
class statements do their business. 


We’ve already learned that when Python reaches a class statement, it runs its nested 
block of code to create its attributes—all the names assigned at the top level of the 
nested code block generate attributes in the resulting class object. These names are 
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usually method functions created by nested defs, but they can also be arbitrary at- 
tributes assigned to create class data shared by all instances. 


Technically speaking, Python follows a standard protocol to make this happen: at the 
end of a class statement, and after running all its nested code in a namespace dictionary, 
it calls the type object to create the class object: 


class = type(classname, superclasses, attributedict) 
The type object in turn defines a __call__ operator overloading method that runs two 
other methods when the type object is called: 


type.__new__(typeclass, classname, superclasses, attributedict) 
type. _init_ (class, classname, superclasses, attributedict) 


The __new_ method creates and returns the new class object, and then the _init__ 


method initializes the newly created object. As we’ll see in a moment, these are the 
hooks that metaclass subclasses of type generally use to customize classes. 


For example, given a class definition like the following: 


class Spam(Eggs): # Inherits from Eggs 
data = 1 # Class data attribute 
def meth(self, arg): # Class method attribute 
pass 


Python will internally run the nested code block to create two attributes of the class 
(data and meth), and then call the type object to generate the class object at the end of 
the class statement: 


Spam = type('Spam', (Eggs,), {'data': 1, 'meth': meth, '_ module‘: '_main_'}) 


Because this call is made at the end of the class statement, it’s an ideal hook for aug- 
menting or otherwise processing a class. The trick lies in replacing type with a custom 
subclass that will intercept this call. The next section shows how. 


Declaring Metaclasses 


As we’ve just seen, classes are created by the type class by default. To tell Python to 
create a class with a custom metaclass instead, you simply need to declare a metaclass 
to intercept the normal class creation call. How you do so depends on which Python 
version you are using. In Python 3.0, list the desired metaclass as a keyword argument 
in the class header: 


class Spam(metaclass=Meta): # 3.0 and later 
Inheritance superclasses can be listed in the header as well, before the metaclass. In the 


following, for example, the new class Spam inherits from Eggs but is also an instance of 
and is created by Meta: 


class Spam(Eggs, metaclass=Meta) : # Other supers okay 
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We can get the same effect in Python 2.6, but we must specify the metaclass 
differently—using a class attribute instead of a keyword argument. The object deriva- 
tion is required to make this a new-style class, and this form no longer works in 3.0 as 
the attribute is simply ignored: 


class spam(object): # 2.6 version (only) 
__metaclass_ = Meta 


In 2.6, a module-global __metaclass__ variable is also available to link all classes in the 
module to a metaclass. This is no longer supported in 3.0, as it was intended as a 
temporary measure to make it easier to default to new-style classes without deriving 
every class from object. 


When declared in these ways, the call to create the class object run at the end of the 
class statement is modified to invoke the metaclass instead of the type default: 

class = Meta(classname, superclasses, attributedict) 
And because the metaclass is a subclass of type, the type class’s___call__ delegates the 


calls to create and initialize the new class object to the metaclass, if it defines custom 
versions of these methods: 


Meta. new (Meta, classname, superclasses, attributedict) 
Meta. _init_ (class, classname, superclasses, attributedict) 


To demonstrate, here’s the prior section’s example again, augmented with a 3.0 
metaclass specification: 


class Spam(Eggs, metaclass=Meta) : # Inherits from Eggs, instance of Meta 
data = 1 # Class data attribute 
def meth(self, arg): # Class method attribute 
pass 


At the end of this class statement, Python internally runs the following to create the 
class object: 


Spam = Meta('Spam', (Eggs,), {'data': 1, 'meth': meth, '__module_': '_main_'}) 


If the metaclass defines its own versions of _new_ or _init_, they will be invoked 
in turn during this call by the inherited type class’s __call__ method, to create and 
initialize the new class. The next section shows how we might go about coding this 
final piece of the metaclass puzzle. 


Coding Metaclasses 


So far, we’ve seen how Python routes class creation calls to a metaclass, if one is pro- 
vided. How, though, do we actually code a metaclass that customizes type? 


It turns out that you already know most of the story—metaclasses are coded with 
normal Python class statements and semantics. Their only substantial distinctions are 
that Python calls them automatically at the end of a class statement, and that they 
must adhere to the interface expected by the type superclass. 
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A Basic Metaclass 


Perhaps the simplest metaclass you can code is simply a subclass of type with a 
__new_ method that creates the class object by running the default version in type. A 
metaclass _new_ like this is run by the _call_ method inherited from type; it typi- 
cally performs whatever customization is required and calls the type superclass’s 
__new_ method to create and return the new class object: 
class Meta(type): 
def _new_ (meta, classname, supers, classdict): 


# Run by inherited type.__call__ 
return type. _new_ (meta, classname, supers, classdict) 


This metaclass doesn’t really do anything (we might as well let the default type class 
create the class), but it demonstrates the way a metaclass taps into the metaclass hook 
to customize—because the metaclass is called at the end of a class statement, and 
because the type object’s __call__ dispatches to the _new_ and _init__ methods, 
code we provide in these methods can manage all the classes created from the metaclass. 


Here’s our example in action again, with prints added to the metaclass and the file at 
large to trace: 
class MetaOne(type): 
def _new_ (meta, classname, supers, classdict): 


print('In MetaOne.new:', classname, supers, classdict, sep='\n...') 
return type. _new_ (meta, classname, supers, classdict) 


class Eggs: 
pass 


print('making class') 


class Spam(Eggs, metaclass=MetaOne) : # Inherits from Eggs, instance of Meta 
data = 1 # Class data attribute 
def meth(self, arg): # Class method attribute 
pass 


print('making instance’) 

X = Spam() 

print('data:', X.data) 
Here, Spam inherits from Eggs and is an instance of MetaOne, but X is an instance of and 
inherits from Spam. When this code is run with Python 3.0, notice how the metaclass 
is invoked at the end of the class statement, before we ever make an instance— 
metaclasses are for processing classes, and classes are for processing instances: 


making class 
In MetaOne.new: 


++ . Spam 
... (<class '__main__.Eggs'>,) 
..{'__module_': '_main_', 'data': 1, 'meth': <function meth at 0x02AEBA08>} 
making instance 
data: 1 
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Customizing Construction and Initialization 


Metaclasses can also tap into the _init__ protocol invoked by the type object’s 
__call_:in general, _new__ creates and returns the class object, and _init__ initial- 
izes the already created class. Metaclasses can use both hooks to manage the class at 
creation time: 
class MetaOne(type): 
def _new_ (meta, classname, supers, classdict): 


print('In MetaOne.new: ', classname, supers, classdict, sep='\n...') 
return type. _new_ (meta, classname, supers, classdict) 


def _init_ (Class, classname, supers, classdict): 
print('In MetaOne init:', classname, supers, classdict, sep='\n...') 
print('...init class object:', list(Class.dict__.keys())) 


class Eggs: 
pass 


print('making class') 


class Spam(Eggs, metaclass=MetaOne) : # Inherits from Eggs, instance of Meta 
data = 1 # Class data attribute 
def meth(self, arg): # Class method attribute 
pass 


print('making instance’) 

X = Spam() 

print('data:', X.data) 
In this case, the class initialization method is run after the class construction method, 
but both run at the end of the class statement before any instances are made: 


making class 
In MetaOne.new: 


. . . Spam 
... (<class '__main__.Eggs'>,) 
..{'__module__': '_main_', 'data': 1, 'meth': <function meth at 0x02AAB810>} 
In MetaOne init: 
. . . Spam 
... (<class '__main__.Eggs'>,) 
.-{'__module_': '_main_', 'data': 1, 'meth': <function meth at 0x02AAB810>} 


...init class object: [ 
making instance 
data: 1 


__module_', 'data', 'meth', '__doc_'] 


Other Metaclass Coding Techniques 


Although redefining the type superclass’s __new_ and _init__ methods is the most 
common way metaclasses insert logic into the class object creation process, other 
schemes are possible, too. 
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Using simple factory functions 


For example, metaclasses need not really be classes at all. As we’ve learned, the class 
statement issues a simple call to create a class at the conclusion of its processing. Be- 
cause of this, any callable object can in principle be used as a metaclass, provided it 
accepts the arguments passed and returns an object compatible with the intended class. 
In fact, a simple object factory function will serve just as well as a class: 


# A simple function can serve as a metaclass too 

def MetaFunc(classname, supers, classdict): 
print('In MetaFunc: ', classname, supers, classdict, sep='\n...') 
return type(classname, supers, classdict) 


class Eggs: 
pass 


print('making class') 


class Spam(Eggs, metaclass=MetaFunc) : # Run simple function at end 
data = 1 # Function returns class 
def meth(self, args): 
pass 


print('making instance’) 

X = Spam() 

print('data:', X.data) 
When run, the function is called at the end of the declaring class statement, and it 
returns the expected new class object. The function is simply catching the call that the 
type object’s _call__ normally intercepts by default: 


making class 
In MetaFunc: 


. + . Spam 
... (<class '__main__.Eggs'>,) 
..{'__module_': '_main_', 'data': 1, 'meth': <function meth at 0x02B8B6A8>} 
making instance 
data: 1 


Overloading class creation calls with metaclasses 


Since they participate in normal OOP mechanics, it’s also possible for metaclasses to 
catch the creation call at the end of a class statement directly, by redefining the type 
object’s _call_. The required protocol is a bit involved, though: 


# _call_ can be redefined, metas can have metas 


class SuperMeta(type): 
def _call_ (meta, classname, supers, classdict): 
print('In SuperMeta.call: ', classname, supers, classdict, sep='\n...') 
return type. _call_ (meta, classname, supers, classdict) 


class SubMeta(type, metaclass=SuperMeta) : 
def _new_ (meta, classname, supers, classdict): 
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print('In SubMeta.new: ', classname, supers, classdict, sep='\n...') 
return type. _new_ (meta, classname, supers, classdict) 


def _init_ (Class, classname, supers, classdict): 
print('In SubMeta init:', classname, supers, classdict, sep='\n...') 
print('...init class object:', list(Class. dict__.keys())) 


class Eggs: 
pass 


print('making class') 
class Spam(Eggs, metaclass=SubMeta) : 
data = 1 
def meth(self, arg): 
pass 


print('making instance’) 

X = Spam() 

print('data:', X.data) 
When this code is run, all three redefined methods run in turn. This is essentially what 
the type object does by default: 


making class 
In SuperMeta.call: 


. . . Spam 
... (<class '__main__.Eggs'>,) 
..{'__module__': '_main_', 'data': 1, 'meth': <function meth at 0x02B7BA98>} 
In SubMeta.new: 
. . . Spam 
... (<class '__main__.Eggs'>,) 
..{'__module_': '_main_', 'data': 1, 'meth': <function meth at 0x02B7BA98>} 
In SubMeta init: 
. . . Spam 
... (<class '__main__.Eggs'>,) 
..{'__module__': '_main_', 'data': 1, 'meth': <function meth at 0x02B7BA98>} 


...init class object: [ 
making instance 
data: 1 


__module__', ‘data', 'meth', '  doc_'] 


Overloading class creation calls with normal classes 


The preceding example is complicated by the fact that metaclasses are used to create 
class objects, but don’t generate instances of themselves. Because of this, with meta- 
classes name lookup rules are somewhat different than what we are accustomed to. 
The _call_ method, for example, is looked up in the class of an object; for meta- 
classes, this means the metaclass of a metaclass. 


To use normal inheritance-based name lookup, we can achieve the same effect with 
normal classes and instances. The output of the following is the same as the preceding 
version, but note that _new_ and _init__ must have different names here, or else 
they will run when the SubMeta instance is created, not when it is later called as a 
metaclass: 
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class SuperMeta: 
def _call_(self, classname, supers, classdict): 
print('In SuperMeta.call: ', classname, supers, classdict, sep='\n...') 
Class = self. New (classname, supers, classdict) 
self. Init (Class, classname, supers, classdict) 
return Class 


class SubMeta(SuperMeta) : 
def _New_ (self, classname, supers, classdict): 
print('In SubMeta.new: ', classname, supers, classdict, sep='\n...') 
return type(classname, supers, classdict) 


def _Init_ (self, Class, classname, supers, classdict): 
print('In SubMeta init:', classname, supers, classdict, sep='\n...') 
print('...init class object:', list(Class.dict__.keys())) 


class Eggs: 
pass 


print('making class') 


class Spam(Eggs, metaclass=SubMeta()): # Meta is normal class instance 
data = 1 # Called at end of statement 
def meth(self, arg): 
pass 


print('making instance’) 
X = Spam() 
print('data:', X.data) 


Although these alternative forms work, most metaclasses get their work done by rede- 
fining the type superclass’s _new__ and _init__; in practice, this is usually as much 
control as is required, and it’s often simpler than other schemes. However, we’ll see 
later that a simple function-based metaclass can often work much like a class decorator, 
which allows the metaclasses to manage instances as well as classes. 


Instances Versus Inheritance 


Because metaclasses are specified in similar ways to inheritance superclasses, they can 
be a bit confusing at first glance. A few key points should help summarize and clarify 
the model: 


e Metaclasses inherit from the type class. Although they have a special role, 
metaclasses are coded with class statements and follow the usual OOP model in 
Python. For example, as subclasses of type, they can redefine the type object’s 
methods, overriding and customizing them as needed. Metaclasses typically rede- 
fine the type class’s _new__ and _init__ to customize class creation and initiali- 
zation, but they can also redefine __call_ if they wish to catch the end-of-class 
creation call directly. Although it’s unusual, they can even be simple functions that 
return arbitrary objects, instead of type subclasses. 
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e Metaclass declarations are inherited by subclasses. The metaclass=M declara- 
tion in a user-defined class is inherited by the class’s subclasses, too, so the meta- 
class will run for the construction of each class that inherits this specification in a 
superclass chain. 


e Metaclass attributes are not inherited by class instances. Metaclass declara- 
tions specify an instance relationship, which is not the same as inheritance. Because 
classes are instances of metaclasses, the behavior defined in a metaclass applies to 
the class, but not the class’s later instances. Instances obtain behavior from their 
classes and superclasses, but not from any metaclasses. Technically, instance at- 
tribute lookups usually search only the __dict__ dictionaries of the instance and 
all its classes; the metaclass is not included in inheritance lookup. 


To illustrate the last two points, consider the following example: 


class MetaOne(type): 
def _new_(meta, classname, supers, classdict): # Redefine type method 
print('In MetaOne.new:', classname) 
return type.__new_(meta, classname, supers, classdict) 
def toast(self): 
print('toast') 


class Super (metaclass=MetaOne) : # Metaclass inherited by subs too 
def spam(self): # MetaOne run twice for two classes 
print('spam') 


class C(Super): # Superclass: inheritance versus instance 
def eggs(self): # Classes inherit from superclasses 
print(‘eggs') # But not from metclasses 
X = C() 
X.eggs() # Inherited from C 
X.spam() # Inherited from Super 
X.toast() # Not inherited from metaclass 


When this code is run, the metaclass handles construction of both client classes, and 
instances inherit class attributes but not metaclass attributes: 


In MetaOne.new: Super 
In MetaOne.new: C 


eggs 
spam 
AttributeError: 'C' object has no attribute ‘toast’ 
Although detail matters, it’s important to keep the big picture in mind when dealing 
with metaclasses. Metaclasses like those we’ve seen here will be run automatically for 
every class that declares them. Unlike the helper function approaches we saw earlier, 
such classes will automatically acquire whatever augmentation the metaclass provides. 
Moreover, changes in such augmentation only need to be coded in one place—the 
metaclass—which simplifies making modifications as our needs evolve. Like so many 
tools in Python, metaclasses ease maintenance work by eliminating redundancy. To 
fully sample their power, though, we need to move on to some larger use-case examples. 


Coding Metaclasses | 1069 


Example: Adding Methods to Classes 


In this and the following section, we’re going to study examples of two common use 
cases for metaclasses: adding methods to a class, and decorating all methods automat- 
ically. These are just two of the many metaclass roles, which unfortunately consume 
the space we have left for this chapter; again, you should consult the Web for more 
advanced applications. These examples are representative of metaclasses in action, 
though, and they suffice to illustrate the basics. 


Moreover, both give us an opportunity to contrast class decorators and metaclasses— 
our first example compares metaclass- and decorator-based implementations of class 
augmentation and instance wrapping, and the second applies a decorator with a 
metaclass first and then with another decorator. As you'll see, the two tools are often 
interchangeable, and even complementary. 


Manual Augmentation 


Earlier in this chapter, we looked at skeleton code that augmented classes by adding 
methods to them in various ways. As we saw, simple class-based inheritance suffices if 
the extra methods are statically known when the class is coded. Composition via object 
embedding can often achieve the same effect too. For more dynamic scenarios, though, 
other techniques are sometimes required—helper functions can usually suffice, but 
metaclasses provide an explicit structure and minimize the maintenance costs of 
changes in the future. 


Let’s put these ideas in action here with working code. Consider the following example 
of manual class augmentation—it adds two methods to two classes, after they have 
been created: 


# Extend manually - adding new methods to classes 


class Client1: 
def init__(self, value): 
self.value = value 
def spam(self): 
return self.value * 2 


class Client2: 
value = 'ni?' 


def eggsfunc(obj): 
return obj.value * 4 


def hamfunc(obj, value): 
return value + 'ham' 


Client1.eggs = eggsfunc 
Client1.ham hamfunc 


Client2.eggs = eggsfunc 


1070 | Chapter 39: Metaclasses 


Client2.ham = hamfunc 


X = Client1('Ni!') 

print (X.spam()) 

print (X.eggs()) 

print(X.ham('bacon')) 

Y = Client2() 

print(Y.eggs()) 

print(Y.ham('bacon')) 
This works because methods can always be assigned to a class after it’s been created, 
as long as the methods assigned are functions with an extra first argument to receive 
the subject self instance—this argument can be used to access state information ac- 
cessible from the class instance, even though the function is defined independently of 
the class. 


When this code runs, we receive the output of a method coded inside the first class, as 
well as the two methods added to the classes after the fact: 

Ni!Ni! 

Ni!Ni!Ni!Ni! 

baconham 

ni?ni?ni?ni? 

baconham 
This scheme works well in isolated cases and can be used to fill out a class arbitrarily 
at runtime. It suffers from a potentially major downside, though: we have to repeat the 
augmentation code for every class that needs these methods. In our case, it wasn’t too 
onerous to add the two methods to both classes, but in more complex scenarios this 
approach can be time-consuming and error-prone. If we ever forget to do this consis- 
tently, or we ever need to change the augmentation, we can run into problems. 


Metaclass-Based Augmentation 


Although manual augmentation works, in larger programs it would be better if we could 
apply such changes to an entire set of classes automatically. That way, we'd avoid the 
chance of the augmentation being botched for any given class. Moreover, coding the 
augmentation in a single location better supports future changes—all classes in the set 
will pick up changes automatically. 


One way to meet this goal is to use metaclasses. If we code the augmentation in a 
metaclass, every class that declares that metaclass will be augmented uniformly and 
correctly and will automatically pick up any changes made in the future. The following 
code demonstrates: 


# Extend with a metaclass - supports future changes better 


def eggsfunc(obj): 
return obj.value * 4 
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def hamfunc(obj, value): 
return value + ‘ham' 


class Extender(type): 
def _new_ (meta, classname, supers, classdict): 
classdict['eggs'] = eggsfunc 
classdict['ham'] = hamfunc 
return type.__new_(meta, classname, supers, classdict) 


class Client1(metaclass=Extender): 
def _ init__(self, value): 
self.value = value 
def spam(self): 
return self.value * 2 


class Client2(metaclass=Extender): 
value = 'ni?' 


X = Client1('Ni!') 
print (X.spam()) 


print (X.eggs()) 
print(X.ham('bacon')) 


Y = Client2() 

print(Y.eggs()) 

print(Y.ham('bacon')) 
This time, both of the client classes are extended with the new methods because they 
are instances of a metaclass that performs the augmentation. When run, this version’s 
output is the same as before—we haven’t changed what the code does, we’ve just re- 
factored it to encapsulate the augmentation more cleanly: 

Ni!Ni! 

Ni!Ni!Ni!Ni! 

baconham 

ni?ni?ni?ni? 

baconham 


Notice that the metaclass in this example still performs a fairly static task: adding two 
known methods to every class that declares it. In fact, if all we need to do is always add 
the same two methods to a set of classes, we might as well code them in a normal 
superclass and inherit in subclasses. In practice, though, the metaclass structure sup- 
ports much more dynamic behavior. For instance, the subject class might also be con- 
figured based upon arbitrary logic at runtime: 


# Can also configure class based on runtime tests 


class MetaExtend(type): 
def _new_ (meta, classname, supers, classdict): 
if sometest(): 
classdict['eggs'] = eggsfunc1 
else: 
classdict['eggs'] = eggsfunc2 
if someothertest(): 


1072 | Chapter39: Metaclasses 


classdict['ham'] = hamfunc 
else: 

classdict['ham'] = lambda *args: 'Not supported’ 
return type. _new_ (meta, classname, supers, classdict) 


Metaclasses Versus Class Decorators: Round 2 


Just in case this chapter has not yet managed to make your head explode, keep in mind 
again that the prior chapter’s class decorators often overlap with this chapter’s meta- 
classes in terms of functionality. This derives from the fact that: 


e Class decorators rebind class names to the result of a function at the end of a 
class statement. 


e Metaclasses work by routing class object creation through an object at the end of 
a class statement. 


Although these are slightly different models, in practice they can usually achieve the 
same goals, albeit in different ways. In fact, class decorators can be used to manage 
both instances of a class and the class itself. While decorators can manage classes nat- 
urally, though, it’s somewhat less straightforward for metaclasses to manage instances. 
Metaclasses are probably best used for class object management. 


Decorator-based augmentation 


For example, the prior section’s metaclass example, which adds methods to a class on 
creation, can also be coded as a class decorator; in this mode, decorators roughly cor- 
respond to the _init__ method of metaclasses, since the class object has already been 
created by the time the decorator is invoked. Also like with metaclasses, the original 
class type is retained, since no wrapper object layer is inserted. The output of the fol- 
lowing is the same as that of the prior metaclass code: 


# Extend with a decorator: same as providing __init__ in a metaclass 


def eggsfunc(obj): 
return obj.value * 4 


def hamfunc(obj, value): 
return value + ‘ham' 


def Extender(aClass): 
aClass.eggs = eggsfunc # Manages class, not instance 
aClass.ham = hamfunc # Equiv to metaclass __init__ 
return aClass 


@Extender 
class Client1: # Client1 = Extender(Client1) 
def _ init__(self, value): # Rebound at end of class stmt 


self.value = value 
def spam(self): 
return self.value * 2 
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@Extender 
class Client2: 
value = 'ni?' 


X = Client1('Ni!') # X is a Client1 instance 

print (X.spam()) 

print (X.eggs()) 

print (X.ham('bacon')) 

Y = Client2() 

print(Y.eggs()) 

print(Y.ham('bacon')) 
In other words, at least in certain cases, decorators can manage classes as easily as 
metaclasses. The converse isn’t quite so straightforward, though; metaclasses can be 
used to manage instances, but only with a certain amount of magic. The next section 
demonstrates. 


Managing instances instead of classes 


As we’ve just seen, class decorators can often serve the same class-management role as 
metaclasses. Metaclasses can often serve the same instance-management role as deco- 
rators, too, but this is a bit more complex. That is: 

e Class decorators can manage both classes and instances. 

e Metaclasses can manage both classes and instances, but instances take extra work. 
That said, certain applications may be better coded in one or the other. For example, 


consider the following class decorator example from the prior chapter; it’s used to print 
a trace message whenever any normally named attribute of a class instance is fetched: 


# Class decorator to trace external instance attribute fetches 


def Tracer(aClass): # On @ decorator 
class Wrapper: 
def init__(self, *args, **kargs): # On instance creation 
self.wrapped = aClass(*args, **kargs) # Use enclosing scope name 
def _ getattr_(self, attrname): 
print('Trace:', attrname) # Catches all but .wrapped 


return getattr(self.wrapped, attrname) # Delegate to wrapped object 
return Wrapper 


@Tracer 
class Person: # Person = Tracer(Person) 
def _ init__(self, name, hours, rate): # Wrapper remembers Person 


self.name = name 

self.hours = hours 

self.rate = rate # In-method fetch not traced 
def pay(self): 

return self.hours * self.rate 


bob = Person('Bob', 40, 50) # bob is really a Wrapper 
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print (bob.name) # Wrapper embeds a Person 
print (bob. pay()) # Triggers __getattr__ 


When this code is run, the decorator uses class name rebinding to wrap instance objects 
in an object that produces the trace lines in the following output: 

Trace: name 

Bob 


Trace: pay 
2000 


Although it’s possible for a metaclass to achieve the same effect, it seems less straight- 
forward conceptually. Metaclasses are designed explicitly to manage class object cre- 
ation, and they have an interface tailored for this purpose. To use a metaclass to manage 
instances, we have to rely on a bit more magic. The following metaclass has the same 
effect and output as the prior decorator: 


# Manage instances like the prior example, but with a metaclass 


def Tracer(classname, supers, classdict): # On class creation call 
aClass = type(classname, supers, classdict) # Make client class 
class Wrapper: 
def init__(self, *args, **kargs): # On instance creation 


self.wrapped = aClass(*args, **kargs) 
def _ getattr_(self, attrname): 
print('Trace:', attrname) # Catches all but .wrapped 
return getattr(self.wrapped, attrname) # Delegate to wrapped object 
return Wrapper 


class Person(metaclass=Tracer) : # Make Person with Tracer 
def _ init__(self, name, hours, rate): # Wrapper remembers Person 
self.name = name 
self.hours = hours 
self.rate = rate # In-method fetch not traced 
def pay(self): 
return self.hours * self.rate 


bob = Person('Bob', 40, 50) # bob is really a Wrapper 
print (bob.name) # Wrapper embeds a Person 
print (bob. pay()) # Triggers __getattr__ 


This works, but it relies on two tricks. First, it must use a simple function instead of a 
class, because type subclasses must adhere to object creation protocols. Second, it must 
manually create the subject class by calling type manually; it needs to return an instance 
wrapper, but metaclasses are also responsible for creating and returning the subject 
class. Really, we’re using the metaclass protocol to imitate decorators in this example, 
rather than vice versa; because both run at the conclusion of a class statement, in many 
roles they are just variations on a theme. This metaclass version produces the same 
output as the decorator when run live: 

Trace: name 

Bob 


Trace: pay 
2000 
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You should study both versions of these examples for yourself to weigh their tradeoffs. 
In general, though, metaclasses are probably best suited to class management, due to 
their design; class decorators can manage either instances or classes, though they may 
not be the best option for more advanced metaclass roles that we don’t have space to 
cover in this book (if you want to learn more about decorators and metaclasses after 
reading this chapter, search the Web or Python’s standard manuals). The next section 
concludes this chapter with one more common use case—applying operations to a 
class’s methods automatically. 


Example: Applying Decorators to Methods 


As we saw in the prior section, because they are both run at the end ofa class statement, 
metaclasses and decorators can often be used interchangeably, albeit with different 
syntax. The choice between the two is arbitrary in many contexts. It’s also possible to 
use them in combination, as complementary tools. In this section, we’ll explore an 
example of just such a combination—applying a function decorator to all the methods 
of a class. 


Tracing with Decoration Manually 


In the prior chapter we coded two function decorators, one that traced and counted all 
calls made to a decorated function and another that timed such calls. They took various 
forms there, some of which were applicable to both functions and methods and some 
of which were not. The following collects both decorators’ final forms into a module 
file for reuse and reference here: 


# File mytools.py: assorted decorator tools 


def tracer(func): # Use function, not class with __call__ 

calls = 0 # Else self is decorator instance only 
def onCall(*args, **kwargs): 

nonlocal calls 

calls += 1 

print('call %s to %s' % (calls, func. _name_)) 

return func(*args, **kwargs) 
return onCall 


import time 


def timer(label='', trace=True): # On decorator args: retain args 
def onDecorator(func): # On @: retain decorated func 
def onCall(*args, **kargs): # On calls: call original 
start = time.clock() # State is scopes + func attr 


result = func(*args, **kargs) 

elapsed = time.clock() - start 

onCall.alltime += elapsed 

if trace: 
format = '%s%s: %.5f, %.5F' 
values = (label, func. name_, elapsed, onCall.alltime) 
print(format % values) 
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return result 

onCall.alltime = 0 
return onCall 
return onDecorator 


As we learned in the prior chapter, to use these decorators manually, we simply import 
them from the module and code the decoration @ syntax before each method we wish 
to trace or time: 


from mytools import tracer 


class Person: 
@tracer 
def _ init__(self, name, pay): 
self.name = name 
self.pay = pay 


@tracer 

def giveRaise(self, percent): # giveRaise = tracer(giverRaise) 
self.pay *= (1.0 + percent) # onCall remembers giveRaise 

@tracer 

def lastName(self): # lastName = tracer(lastName) 


return self.name.split()[-1] 


bob = Person('Bob Smith', 50000) 
sue = Person('Sue Jones', 100000) 
print(bob.name, sue.name) 


sue. giveRaise(.10) # Runs onCall(sue, .10) 
print (sue. pay) 
print(bob.lastName(), sue.lastName()) # Runs onCall(bob), remembers lastName 


When this code is run, we get the following output—calls to decorated methods are 
routed to logic that intercepts and then delegates the call, because the original method 
names have been bound to the decorator: 

call 1 to _init_ 

call 2 to init _ 

Bob Smith Sue Jones 

call 1 to giveRaise 

110000.0 

call 1 to lastName 

call 2 to lastName 

Smith Jones 


Tracing with Metaclasses and Decorators 


The manual decoration scheme of the prior section works, but it requires us to add 
decoration syntax before each method we wish to trace and to later remove that syntax 
when we no longer desire tracing. If we want to trace every method of a class, this can 
become tedious in larger programs. It would be better if we could somehow apply the 
tracer decorator to all of a class’s methods automatically. 
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With metaclasses, we can do exactly that—because they are run when a class is con- 
structed, they are a natural place to add decoration wrappers to a class’s methods. By 
scanning the class’s attribute dictionary and testing for function objects there, we can 
automatically run methods through the decorator and rebind the original names to the 
results. The effect is the same as the automatic method name rebinding of decorators, 
but we can apply it more globally: 


# Metaclass that adds tracing decorator to every method of a client class 


from types import FunctionType 
from mytools import tracer 


class MetaTrace(type): 
def _new_ (meta, classname, supers, classdict): 
for attr, attrval in classdict.items(): 


if type(attrval) is FunctionType: # Method? 
classdict[attr] = tracer(attrval) # Decorate it 
return type. _new_ (meta, classname, supers, classdict) # Make class 


class Person(metaclass=MetaTrace) : 
def _ init__(self, name, pay): 
self.name = name 
self.pay = pay 
def giveRaise(self, percent): 
self.pay *= (1.0 + percent) 
def lastName(self): 
return self.name.split()[-1] 


bob = Person('Bob Smith', 50000) 

sue = Person('Sue Jones', 100000) 
print(bob.name, sue.name) 

sue. giveRaise(.10) 

print (sue.pay) 

print(bob.lastName(), sue.lastName()) 


When this code is run, the results are the same as before—calls to methods are routed 
to the tracing decorator first for tracing, and then propagated on to the original method: 

call 1 to _init_ 

call 2 to _init_ 

Bob Smith Sue Jones 

call 1 to giveRaise 

110000.0 

call 1 to lastName 

call 2 to lastName 

Smith Jones 


The result you see here is a combination of decorator and metaclass work—the meta- 
class automatically applies the function decorator to every method at class creation 
time, and the function decorator automatically intercepts method calls in order to print 
the trace messages in this output. The combination “just works,” thanks to the gener- 
ality of both tools. 
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Applying Any Decorator to Methods 


The prior metaclass example works for just one specific function decorator—tracing. 
However, it’s trivial to generalize this to apply any decorator to all the methods of a 
class. All we have to do is add an outer scope layer to retain the desired decorator, much 
like we did for decorators in the prior chapter. The following, for example, codes such 
a generalization and then uses it to apply the tracer decorator again: 


# Metaclass factory: apply any decorator to all methods of a class 


from types import FunctionType 
from mytools import tracer, timer 


def decorateAll (decorator): 
class MetaDecorate(type): 
def __new_(meta, classname, supers, classdict): 
for attr, attrval in classdict.items(): 
if type(attrval) is FunctionType: 
classdict[attr] = decorator(attrval) 
return type._new_(meta, classname, supers, classdict) 
return MetaDecorate 


class Person(metaclass=decorateAll(tracer)): # Apply a decorator to all 
def _ init__(self, name, pay): 
self.name = name 
self.pay = pay 
def giveRaise(self, percent): 
self.pay *= (1.0 + percent) 
def lastName(self): 
return self.name.split()[-1] 


bob = Person('Bob Smith', 50000) 

sue = Person('Sue Jones', 100000) 
print(bob.name, sue.name) 

sue. giveRaise(.10) 

print (sue. pay) 

print(bob.lastName(), sue.lastName()) 


When this code is run as it is, the output is again the same as that of the previous 
examples—we’re still ultimately decorating every method in a client class with the 
tracer function decorator, but we’re doing so in a more generic fashion: 

call 1 to _init_ 

call 2 to _init_ 

Bob Smith Sue Jones 

call 1 to giveRaise 

110000.0 

call 1 to lastName 

call 2 to lastName 

Smith Jones 


Now, to apply a different decorator to the methods, we can simply replace the decorator 
name in the class header line. To use the timer function decorator shown earlier, for 
example, we could use either of the last two header lines in the following when defining 


Example: Applying Decorators to Methods | 1079 


our class—the first accepts the timer’s default arguments, and the second specifies label 
text: 


class Person(metaclass=decorateAll(tracer)): # Apply tracer 
class Person(metaclass=decorateAll(timer())): # Apply timer, defaults 
class Person(metaclass=decorateAll(timer(label='**'))): # Decorator arguments 


Notice that this scheme cannot support nondefault decorator arguments differing per 
method, but it can pass in decorator arguments that apply to all methods, as done here. 
To test, use the last of these metaclass declarations to apply the timer, and add the 
following lines at the end of the script: 


# If using timer: total time per method 


print('-'*40) 

print('%.5f' % Person. init__.alltime) 
print('%.5f' % Person.giveRaise.alltime) 
print('%.5f' % Person. lastName.alltime) 


The new output is as follows—the metaclass wraps methods in timer decorators now, 
so we can tell how long each and every call takes, for every method of the class: 

** init_: 0.00001, 0.00001 

** init _: 0.00001, 0.00002 

Bob Smith Sue Jones 

**giveRaise: 0.00001, 0.00001 

110000.0 

**lastName: 0.00001, 0.00001 

**lastName: 0.00001, 0.00002 

Smith Jones 


0.00002 
0.00001 
0.00002 


Metaclasses Versus Class Decorators: Round 3 


Class decorators intersect with metaclasses here, too. The following version replaces 
the preceding example’s metaclass with a class decorator. It defines and uses a class 
decorator that applies a function decorator to all methods of a class. Although the prior 
sentence may sound more like a Zen statement than a technical description, this 
all works quite naturally—Python’s decorators support arbitrary nesting and 
combinations: 


# Class decorator factory: apply any decorator to all methods of a class 


from types import FunctionType 
from mytools import tracer, timer 


def decorateAll(decorator): 
def DecoDecorate(aClass): 
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for attr, attrval in aClass._dict__.items(): 
if type(attrval) is FunctionType: 
setattr(aClass, attr, decorator(attrval)) # Not __dict__ 
return aClass 
return DecoDecorate 


@decorateAll (tracer) # Use a class decorator 
class Person: # Applies func decorator to methods 
def _init_(self, name, pay): # Person = decorateAll(..) (Person) 
self.name = name # Person = DecoDecorate(Person) 


self.pay = pay 
def giveRaise(self, percent): 
self.pay *= (1.0 + percent) 
def lastName(self): 
return self.name.split()[-1] 


bob = Person('Bob Smith', 50000) 

sue = Person('Sue Jones', 100000) 
print(bob.name, sue.name) 

sue. giveRaise(.10) 

print (sue.pay) 

print(bob.lastName(), sue.lastName()) 


When this code is run as it is, the class decorator applies the tracer function decorator 
to every method and produces a trace message on calls (the output is the same as that 
of the preceding metaclass version of this example): 

call 1 to _init_ 

call 2 to _init_ 

Bob Smith Sue Jones 

call 1 to giveRaise 

110000.0 

call 1 to lastName 

call 2 to lastName 

Smith Jones 


Notice that the class decorator returns the original, augmented class, not a wrapper 
layer for it (as is common when wrapping instance objects instead). As for the metaclass 
version, we retain the type of the original class—an instance of Person is an instance of 
Person, not of some wrapper class. In fact, this class decorator deals with class creation 
only; instance creation calls are not intercepted at all. 


This distinction can matter in programs that require type testing for instances to yield 
the original class, not a wrapper. When augmenting a class instead of an instance, class 
decorators can retain the original class type. The class’s methods are not their original 
functions because they are rebound to decorators, but this is less important in practice, 
and it’s true in the metaclass alternative as well. 


Also note that, like the metaclass version, this structure cannot support function dec- 
orator arguments that differ per method, but it can handle such arguments if they apply 
to all methods. To use this scheme to apply the timer decorator, for example, either of 
the last two decoration lines in the following will suffice if coded just before our class 
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definition—the first uses decorator argument defaults, and the second provides one 
explicitly: 


@decorateAll (tracer) # Decorate all with tracer 
@decorateAll(timer()) # Decorate all with timer, defaults 
@decorateAll(timer(label='@@')) # Same but pass a decorator argument 


As before, let’s use the last of these decorator lines and add the following at the end of 
the script to test our example with a different decorator: 


# If using timer: total time per method 


print('-'*40) 

print('%.5f' % Person. _init_.alltime) 
print('%.5f' % Person.giveRaise.alltime) 
print('%.5f' % Person.lastName.alltime) 


The same sort of output appears—for every method we get timing data for each and 
all calls, but we’ve passed a different label argument to the timer decorator: 


@@ _init__: 0.00001, 0.00001 
@@ _init__: 0.00001, 0.00002 
Bob Smith Sue Jones 
@@giveRaise: 0.00001, 0.00001 
110000.0 

@@lastName: 0.00001, 0.00001 
@@lastName: 0.00001, 0.00002 
Smith Jones 

0.00002 

0.00001 

0.00002 


As you can see, metaclasses and class decorators are not only often interchangeable, 
but also commonly complementary. Both provide advanced but powerful ways to cus- 
tomize and manage both class and instance objects, because both ultimately allow you 
to insert code into the class creation process. Although some more advanced applica- 
tions may be better coded with one or the other, the way you choose or combine these 
two tools in many cases is largely up to you. 


“Optional” Language Features 


I included a quote near the start of this chapter about metaclasses not being of interest 
to 99% of Python programmers, to underscore their relative obscurity. That statement 
is not quite accurate, though, and not just numerically so. 


The quote’s author is a friend of mine from the early days of Python, and I don’t mean 
to pick on anyone unfairly. Moreover, I’ve often made such statements about language 
feature obscurity myself—in this very book, in fact. 


The problem, though, is that such statements really only apply to people who work 
alone and only ever use code that they’ve written themselves. As soon as an “optional” 
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advanced language feature is used by anyone in an organization, it is no longer 
optional—it is effectively imposed on everyone in the organization. The same holds 
true for externally developed software you use in your systems—if the software’s author 
uses an advanced language feature, it’s no longer entirely optional for you, because you 
have to understand the feature to use or change the code. 


This observation applies to all the advanced tools listed near the beginning of this 
chapter—decorators, properties, descriptors, metaclasses, and so on. If any person or 
program you need to work with uses them, they automatically become part of your 
required knowledge base too. That is, nothing is truly “optional” if nothing is truly op- 
tional. Most of us don’t get to pick and choose. 


This is why some Python old-timers (myself included) sometimes lament that Python 
seems to have grown larger and more complex over time. New features added by vet- 
erans seem to have raised the intellectual bar for newcomers. Although Python’s core 
ideas, like dynamic typing and built-in types, have remained essentially the same, its 
advanced additions can become required reading for any Python programmer. I chose 
to cover these topics here for this reason, despite the omission of most in prior editions. 
It’s not possible to skip the advanced stuff if it’s in code you have to understand. 


On the other hand, many new learners can pick up advanced topics as needed. And 
frankly, application programmers tend to spend most of their time dealing with libraries 
and extensions, not advanced and sometimes arcane language features. For instance, 
the book Programming Python, a follow-up to this one, deals mostly with the marriage 
of Python to application libraries for tasks such as GUIs, databases, and the Web, not 
with esoteric language tools. 


The flipside of this growth is that Python has become more powerful. When used well, 
tools like decorators and metaclasses are not only arguably “cool,” but allow creative 
programmers to build more flexible and useful APIs for other programmers to use. As 
we've seen, they can also provide good solutions to problems of encapsulation and 
maintenance. 


Whether this justifies the potential expansion of required Python knowledge is up to 
you to decide. Unfortunately, a person’s skill level often decides this issue by default— 
more advanced programmers like more advanced tools and tend to forget about their 
impact on other camps. Fortunately, though, this isn’t an absolute; good programmers 
also understand that simplicity is good engineering, and advanced tools should be used 
only when warranted. This is true in any programming language, but especially in a 
language like Python that is frequently exposed to new or novice programmers as an 
extension tool. 


If you’re still not buying this, keep in mind that there are very many Python users who 
are not comfortable with even basic OOP and classes. Trust me on this; I’ve met thou- 
sands of them. Python-based systems that require their users to master the nuances of 
metaclasses, decorators, and the like should probably scale their market expectations 

accordingly. 
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Chapter Summary 


In this chapter, we studied metaclasses and explored examples of them in action. 
Metaclasses allow us to tap into the class creation protocol of Python, in order to man- 
age or augment user-defined classes. Because they automate this process, they can pro- 
vide better solutions for API writers then manual code or helper functions; because 
they encapsulate such code, they can minimize maintenance costs better than some 
other approaches. 


Along the way, we also saw how the roles of class decorators and metaclasses often 
intersect: because both run at the conclusion of a class statement, they can sometimes 
be used interchangeably. Class decorators can be used to manage both class and in- 
stance objects; metaclasses can, too, although they are more directly targeted toward 
classes. 


Since this chapter covered an advanced topic, we’ll work through just a few quiz ques- 
tions to review the basics (if you’ve made it this far in a chapter on metaclasses, you 
probably already deserve extra credit!). Because this is the last part of the book, we’ll 
forego the end-of-part exercises. Be sure to see the appendixes that follow for pointers 
on installation steps, and the solutions to the prior parts’ exercises. 


Once you finish the quiz, you’ve officially reached the end of this book. Now that you 
know Python inside and out, your next step, should you choose to take it, is to explore 
the libraries, techniques, and tools available in the application domains in which you 
work. Because Python is so widely used, you'll find ample resources for using it in 
almost any application you can think of—from GUIs, the Web, and databases to nu- 
meric programming, robotics, and system administration. 


This is where Python starts to become truly fun, but this is also where this book’s story 
ends, and others’ begin. For pointers on where to turn after this book, see the list of 
recommended follow-up texts in the Preface. Good luck with your journey. And of 
course, “Always look on the bright side of Life!” 


Test Your Knowledge: Quiz 


. What is a metaclass? 
. How do you declare the metaclass of a class? 
. How do class decorators overlap with metaclasses for managing classes? 


. How do class decorators overlap with metaclasses for managing instances? 


aA BR WN 


. Would you rather count decorators or metaclasses amongst your weaponry? (And 
please phrase your answer in terms of a popular Monty Python skit.) 
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Test Your Knowledge: Answers 


1. A metaclass is a class used to create a class. Normal classes are instances of the 
type class by default. Metaclasses are usually subclasses of the type class, which 
redefines class creation protocol methods in order to customize the class creation 
call issued at the end of a class statement; they typically redefine the methods 

new_ and __init__ to tap into the class creation protocol. Metaclasses can also 


be coded other ways—as simple functions, for example—but they are responsible 
for making and returning an object for the new class. 


2. In Python 3.0 and later, use a keyword argument in the class header line: 
class C(metaclass=M). In Python 2.X, use a class attribute instead: _metaclass _ 
=M. In 3.0, the class header line can also name normal superclasses (a.k.a. base 
classes) before the metaclass keyword argument. 


3. Because both are automatically triggered at the end of a class statement, class 
decorators and metaclasses can both be used to manage classes. Decorators rebind 
a class name to a callable’s result and metaclasses route class creation through a 
callable, but both hooks can be used for similar purposes. To manage classes, 
decorators simply augment and return the original class objects. Metaclasses aug- 
ment a class after they create it. 


4. Because both are automatically triggered at the end of a class statement, class 
decorators and metaclasses can both be used to manage class instances, by inserting 
a wrapper object to catch instance creation calls. Decorators may rebind the class 
name to a callable run on instance creation that retains the original class object. 
Metaclasses can do the same, but they must also create the class object, so their 
usage is somewhat more complex in this role. 


5. Our chief weapon is decorators...decorators and metaclasses...metaclasses and 
decorators.... Our two weapons are metaclasses and decorators...and ruthless ef- 
ficiency.... Our three weapons are metaclasses, decorators, and ruthless effi- 
ciency...and an almost fanatical devotion to Guido.... Our four...no.... Amongst our 
weapons.... Amongst our weaponry...are such elements as metaclasses, decora- 
tors... Pll come in again... 
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PART IX 


Appendixes 


APPENDIX A 
Installation and Configuration 


This appendix provides additional installation and configuration details as a resource 
for people new to such topics. 


Installing the Python Interpreter 


Because you need the Python interpreter to run Python scripts, the first step in using 
Python is usually installing Python. Unless one is already available on your machine, 
you'll need to fetch, install, and possibly configure a recent version of Python on your 
computer. You'll only need to do this once per machine, and if you will be running a 
frozen binary (described in Chapter 2) or self-installing system, you may not need to 
do much more. 


Is Python Already Present? 


Before you do anything else, check whether you already have a recent Python on your 
machine. If you are working on Linux, Mac OS X, or some Unix systems, Python is 
probably already installed on your computer, though it may be one or two releases 
behind the cutting edge. Here’s how to check: 


e On Windows, check whether there is a Python entry in the Start button’s All Pro- 
grams menu (at the bottom left of the screen). 


e On Mac OS X, open a Terminal window (Applications>Utilities>Terminal) and 
type python at the prompt. 


e On Linux and Unix, type python at a shell prompt (a.k.a. terminal window), and 
see what happens. Alternatively, try searching for “python” in the usual 
places—/usr/bin, /usr/local/bin, etc. 


If you find a Python, make sure it’s a recent version. Although any recent Python will 
do for most of this text, this edition focuses on Python 3.0 and 2.6 specifically, so you 
may want to install one of these to run some of the examples in this book. 
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Speaking of versions, I recommend starting out with Python 3.0 or later if you’re learn- 
ing Python anew and don’t need to deal with existing 2.X code; otherwise, you should 
generally use Python 2.6. Some popular Python-based systems still use older releases, 
though (2.5 is still widespread), so if you’re working with existing systems be sure to 
use a version relevant to your needs; the next section describes locations where you can 
fetch a variety of Python versions. 


Where to Get Python 


If there is no Python on your machine, you will need to install one yourself. The good 
news is that Python is an open source system that is freely available on the Web and 
very easy to install on most platforms. 


You can always fetch the latest and greatest standard Python release from http://www 
.python.org, Python’s official website. Look for the Downloads link on that page, and 
choose a release for the platform on which you will be working. You’ll find prebuilt 
self-installer files for Windows (run to install), Installer Disk Images for Mac OS X 
(installed per Mac conventions), the full source code distribution (typically compiled 
on Linux, Unix, or OS X machines to generate an interpreter), and more. 


Although Python is standard on Linux these days, you can also find RPMs for Linux 
on the Web (unpack them with rpm). Python’s website also has links to pages where 
versions for other platforms are maintained, either at Python.org itself or offsite. A 
Google web search is another great way to find Python packages. Among other plat- 
forms, you can find Python pre-built for iPods, Palm handhelds, Nokia cell phones, 
PlayStation and PSP, Solaris, AS/400, and Windows Mobile. 


If you find yourself pining for a Unix environment on a Windows machine, you might 
also be interested in installing Cygwin and its version of Python (see hitp://www.cygwin 
.com). Cygwin isa GPL-licensed library and toolset that provides full Unix functionality 
on Windows machines, and it includes a prebuilt Python that makes use of the all the 
Unix tools provided. 


You can also find Python on CD-ROMs supplied with Linux distributions, included 
with some products and computer systems, and enclosed with some other Python 
books. These tend to lag behind the current release somewhat, but usually not seriously 
so. 


In addition, you can find Python in some free and commercial development bundles. 
For example, ActiveState distributes Python as part of its ActivePython, a package that 
combines standard Python with extensions for Windows development such as 
PyWin32, an IDE called PythonWin (described in Chapter 3), and other commonly 
used extensions. Python can also be had today in the Enthought Python Distribution— 
a package aimed at scientific computing needs—as well as in Portable Python, precon- 
figured to run directly from a portable device. Search the Web for details. 
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Finally, if you are interested in alternative Python implementations, run a web search 
to check out Jython (the Python port to the Java environment) and IronPython (Python 
for the C#/.NET world), both of which are described in Chapter 2. Installation of these 
systems is beyond the scope of this book. 


Installation Steps 


Once you’ve downloaded Python, you need to install it. Installation steps are very 
platform-specific, but here are a few pointers for the major Python platforms: 


Windows 
On Windows, Python comes as a self-installer MSI program file—simply double- 
click on its file icon, and answer Yes or Next at every prompt to perform a default 
install. The default install includes Python’s documentation set and support for 
tkinter (Tkinter in Python 2.6) GUIs, shelve databases, and the IDLE development 
GUI. Python 3.0 and 2.6 are normally installed in the directories C:\Python30 and 
C:\Python26, though this can be changed at install time. 


For convenience, after the install Python shows up in the Start button’s All Pro- 
grams menu. Python’s menu there has five entries that give quick access to common 
tasks: starting the IDLE user interface, reading module documentation, starting an 
interactive session, reading Python’s standard manuals in a web browser, and un- 
installing. Most of these options involve concepts explored in detail elsewhere in 
this text. 


When installed on Windows, Python also by default automatically registers itself 
to be the program that opens Python files when their icons are clicked (a program 
launch technique described in Chapter 3). It is also possible to build Python from 
its source code on Windows, but this is not commonly done. 


One note for Windows Vista users: security features of the some versions of Vista 
change some of the rules for using MSI installer files. Although this may be a 
nonissue by the time you read these words, see the sidebar “The Python MSI In- 
staller on Windows Vista” on page 1092 in this appendix for assistance if the 
current Python installer does not work, or does not place Python in the correct 
place on your machine. 


Linux 
On Linux, Python is available as one or more RPM files, which you unpack in the 
usual way (consult the RPM manpage for details). Depending on which RPMs you 
download, there may be one for Python itself, and another that adds support for 
tkinter GUIs and the IDLE environment. Because Linux is a Unix-like system, the 
next paragraph applies as well. 

Unix 
On Unix systems, Python is usually compiled from its full C source code distribu- 
tion. This usually only requires you to unpack the file and run simple config and 
make commands; Python configures its own build procedure automatically, 
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according to the system on which it is being compiled. However, be sure to see the 
package’s README file for more details on this process. Because Python is open 
source, its source code may be used and distributed free of charge. 


On other platforms the installation details can differ widely, but they generally follow 
the platform’s normal conventions. Installing the “Pippy” port of Python for PalmOS, 
for example, requires a hotsync operation with your PDA, and Python for the Sharp 
Zaurus Linux-based PDA comes as one or more .ipk files, which you simply run to 
install it. Because additional install procedures for both executable and source forms 
are well documented, though, we’ll skip further details here. 


The Python MSI Installer on Windows Vista 


As I write this, the Python self-installer for Windows is an .msi installation file. This 
format works fine on Windows XP (simply double-click on the file, and it runs), but it 
can have issues on some versions of Windows Vista. In particular, running the MSI 
installer by clicking on it may cause Python to be installed at the root of the C: drive, 
instead of in the correct C:\PythonXX directory. Python still works in the root directory, 
but this is not the correct place to install it. 


This is a Vista security-related issue; in short, MSI files are not true executables, so they 
do not correctly inherit administrator permissions, even if run by the administrator 
user. Instead, MSI files are run via the Windows Registry—their filenames are associ- 
ated with the MSI installer program. 


This problem seems to be either Python- or Vista-version specific. On a recent laptop, 
for example, Python 2.6 and 3.0 installed without issue. To install Python 2.5.2 on my 
Vista-based OQO handheld, though, I had to use a command-line approach to force 
the required administrator permissions. 


If Python doesn’t install in the right place for you, here’s the workaround: go to your 
Start button, select the All Programs entry, choose Accessories, right-click on the Com- 
mand Prompt entry there, choose “Run as administrator,” and select Continue in the 
access control dialog. Now, within the Command Prompt window, issue a cd command 
to change to the directory where your Python MSI installer file resides (e.g., 
cd C:\user\downloads), and then run the MSI installer manually by typing a command 
line of the form msiexec /i python-2.5.1.msi. Finally, follow the usual GUI interactions 
to complete the install. 


Naturally, this behavior may change over time. This procedure may not be required in 
every version of Vista, and additional workarounds may be possible (such as disabling 
Vista security, if you dare). It’s also possible that the Python self-installer may eventually 
be provided in a different format that obviates this problem—as a true executable, for 
instance. Be sure to try your installer by simply clicking its icon to see if it works properly 
before attempting any workarounds. 
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Configuring Python 


After you’ve installed Python, you may want to configure some system settings that 
impact the way Python runs your code. (If you are just getting started with the language, 
you can probably skip this section completely; there is usually no need to specify any 
system settings for basic programs.) 


Generally speaking, parts of the Python interpreter’s behavior can be configured with 
environment variable settings and command-line options. In this section, we’ll take a 
brief look at both, but be sure to see other documentation sources for more details on 
the topics we introduce here. 


Python Environment Variables 


Environment variables—known to some as shell variables, or DOS variables—are 
system-wide settings that live outside Python and thus can be used to customize the 
interpreter’s behavior each time it is run on a given computer. Python recognizes a 
handful of environment variable settings, but only a few are used often enough to war- 
rant explanation here. Table A-1 summarizes the main Python-related environment 
variable settings. 


Table A-1. Important environment variables 


Variable Role 

PATH (or path) System shell search path (for finding “python”) 
PYTHONPATH Python module search path (for imports) 
PYTHONSTARTUP Path to Python interactive startup file 


TCL_LIBRARY, TK_LIBRARY — GUI extension variables (tkinter) 


These variables are straightforward to use, but here are a few pointers: 


PATH 
The PATH setting lists a set of directories that the operating system searches for 
executable programs. It should normally include the directory where your Python 
interpreter lives (the python program on Unix, or the python.exe file on Windows). 


You don’t need to set this variable at all if you are willing to work in the directory 
where Python resides, or type the full path to Python in command lines. On Win- 
dows, for instance, the PATH is irrelevant if you run a cd C:\Python30 before running 
any code (to change to the directory where Python lives), or always type 
C:\Python30\python instead of just python (giving a full path). Also, note that 
PATH settings are mostly for launching programs from command lines; they are 
usually irrelevant when launching via icon clicks and IDEs. 


Configuring Python | 1093 


PYTHONPATH 

The PYTHONPATH setting serves a role similar to PATH: the Python interpreter consults 
the PYTHONPATH variable to locate module files when you import them in a program. 
If used, this variable is set to a platform-dependent list of directory names, sepa- 
rated by colons on Unix and semicolons on Windows. This list normally includes 
just your own source code directories. Its content is merged into the sys.path 
module import search path, along with the script’s directory, any path file settings, 
and standard library directories. 


You don’t need to set this variable unless you will be performing cross-directory 
imports—because Python always searches the home directory of the program’s 
top-level file automatically, this setting is required only if a module needs to import 
another module that lives in a different directory. See also the discussion of .pth 
path files later in this appendix for an alternative to PYTHONPATH. For more on the 
module search path, refer to Chapter 21. 


PYTHONSTARTUP 
If PYTHONSTARTUP is set to the pathname of a file of Python code, Python executes 
the file’s code automatically whenever you start the interactive interpreter, as 
though you had typed it at the interactive command line. This is a rarely used but 
handy way to make sure you always load certain utilities when working interac- 
tively; it saves an import. 


tkinter settings 
If you wish to use the tkinter GUI toolkit (named Tkinter in 2.6), you might have 
to set the two GUI variables in the last line of Table A-1 to the names of the source 
library directories of the Tcl and Tk systems (much like PYTHONPATH). However, 
these settings are not required on Windows systems (where tkinter support is 
installed alongside Python), and they’re usually not required elsewhere if Tcl and 
Tk reside in standard directories. 


Note that because these environment settings are external to Python itself, when you 
set them is usually irrelevant: this can be done before or after Python is installed, as 
long as they are set the way you require before Python is actually run. 


Getting tkinter (and IDLE) GUI Support on Linux 


The IDLE interface described in Chapter 2 is a Python tkinter GUI program. The 
tkinter module (named Tkinter in 2.6) is a GUI toolkit, and it’s a complete, standard 
component of Python on Windows and some other platforms. On some Linux systems, 
though, the underlying GUI library may not be a standard installed component. To add 
GUI support to your Python on Linux if needed, try running a command line of the 
form yum tkinter to automatically install tkinter’s underlying libraries. This should 
work on Linux distributions (and some other systems) on which the yum installation 
program is available. 
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How to Set Configuration Options 


The way to set Python-related environment variables, and what to set them to, depends 
on the type of computer you’re working on. And again, remember that you won’t 
necessarily have to set these at all right away; especially if you’re working in IDLE 
(described in Chapter 3), configuration is not required up front. 


But suppose, for illustration, that you have generally useful module files in directories 
called utilities and package1 somewhere on your machine, and you want to be able to 
import these modules from files located in other directories. That is, to load a file called 
spam.py from the utilities directory, you want to be able to say: 
import spam 

from another file located anywhere on your computer. To make this work, you’ll have 
to configure your module search path one way or another to include the directory 
containing spam.py. Here are a few tips on this process. 


Unix/Linux shell variables 


On Unix systems, the way to set environment variables depends on the shell you use. 
Under the csh shell, you might add a line like the following in your .cshrc or .login file 
to set the Python module search path: 


setenv PYTHONPATH /usr/home/pycode/utilities : /usr/lib/pycode/package1 
This tells Python to look for imported modules in two user-defined directories. Alter- 


natively, if you’re using the ksh shell, the setting might instead appear in your .kshrc 
file and look like this: 


export PYTHONPATH="/usr/home/pycode/utilities : /usr/lib/pycode/package1" 


Other shells may use different (but analogous) syntax. 


DOS variables (Windows) 


If you are using MS-DOS, or some older flavors of Windows, you may need to add an 
environment variable configuration command to your C:\autoexec.bat file, and reboot 
your machine for the changes to take effect. The configuration command on such ma- 
chines has a syntax unique to DOS: 


set PYTHONPATH=c: \pycode\utilities ;d:\pycode\package1 
You can type such a command in a DOS console window, too, but the setting will then 


be active only for that one console window. Changing your .bat file makes the change 
permanent and global to all programs. 


Windows environment variable GUI 


On more recent versions of Windows, including XP and Vista, you can instead set 
PYTHONPATH and other variables via the system environment variable GUI without having 


Configuring Python | 1095 


to edit files or reboot. On XP, select the Control Panel, choose the System icon, pick 
the Advanced tab, and click the Environment Variables button to edit or add new 
variables (PYTHONPATH is usually a user variable). Use the same variable name and values 
syntax shown in the DOS set command earlier. The procedure is similar on Vista, but 
you may have to verify operations along the way. 


You do not need to reboot your machine, but be sure to restart Python if it’s open so 
that it picks up your changes—it configures its path at startup time only. If you’re 
working in a Windows Command Prompt window, you'll probably need to restart that 
to pick up your changes as well. 


Windows registry 


If you are an experienced Windows user, you may also be able to configure the module 
search path by using the Windows Registry Editor. Go to Start>Run... and type 
regedit. Assuming the typical registry tool is on your machine, you can then navigate 
to Python’s entries and make your changes. This is a delicate and error-prone proce- 
dure, though, so unless you’re familiar with the registry, I suggest using other options 
(indeed, this is akin to performing brain surgery on your computer, so be carefull). 


Path files 


Finally, if you choose to extend the module search path with a .pth file instead of the 
PYTHONPATH variable, you might instead code a text file that looks like the following on 
Windows (e.g., file C\Python30\mypath.pth): 


c:\pycode\utilities 
d:\pycode\package1 


Its contents will differ per platform, and its container directory may differ per both 
platform and Python release. Python locates this file automatically when it starts up. 


Directory names in path files may be absolute, or relative to the directory containing 
the path file; multiple .pth files can be used (all their directories are added), and .pth 
files may appear in various automatically checked directories that are platform- and 
version-specific. In general, a Python release numbered Python N.M typically looks for 
path files in C:\PythonNM and C:\PythonNM\Lib\site-packages on Windows, and 
in /usr/local/lib/pythonN.M/site-packages and /usr/localNib/site-python on Unix and 
Linux. See Chapter 21 for more on using path files to configure the sys.path import 
search path. 


Because environment settings are often optional, and because this isn’t a book on op- 
erating system shells, I'll defer to other sources for further details. Consult your system 
shell’s manpages or other documentation for more information, and if you have trouble 
figuring out what your settings should be, ask your system administrator or another 
local expert for help. 
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Python Command-Line Options 


When you start Python from a system command line (a.k.a. a shell prompt), you can 
pass in a variety of option flags to control how Python runs. Unlike system-wide envi- 
ronment variables, command-line options can be different each time you run a script. 
The complete form of a Python command-line invocation in 3.0 looks like this (2.6 is 
roughly the same, with a few option differences): 


python [-bBdEhiOsSuvVWx?] [-c command | -m module-name | script | - ] [args] 


Most command lines only make use of the script and args parts of this format, to run 
a program’s source file with arguments to be used by the program itself. To illustrate, 
consider the following script file, main,py, which prints the command-line arguments 
list made available to the script as sys.argv: 

# File main.py 

import sys 

print (sys.argv) 
In the following command line, both python and main. py can also be complete directory 
paths, and the three arguments (a b -c) meant for the script show up in the sys.argv 
list. The first item in sys.argv is always the script file’s name, when it is known: 

c:\Python30> python main.py a b -c # Most common: run a script file 

['main.py', 'a', 'b', '-c'] 
Other code format specification options allow you to specify Python code to be run on 
the command line itself (-c), to accept code to run from the standard input stream (a 
- means read from a pipe or redirected input stream file), and so on: 


c:\Python30> python -c "print(2 ** 100)" # Read code from command argument 
1267650600228229401496703205376 

c:\Python30> python -c "import main" # Import a file to run its code 

['-c'] 

c:\Python30> python - < main.py a b -c # Read code from standard input 


['- 
c:\Python30> python - a b -c < main.py # Same effect as prior line 
['-', as 'b', '-c'] 


The -m code specification locates a module on Python’s module search path 
(sys.path) and runs it as a top-level script (as module _main__). Leave off the “.py” 
suffix here, since the filename is a module: 


i ri 'b', '-c'] 


c:\Python30> python -m main a b -c # Locate/run module as script 
['c:\\Python30\\main.py', 'a', 'b', '-c'] 
The -m option also supports running modules in packages with relative import syntax, 
as well as modules located in .zip archives. This switch is commonly used to run the 
pdb debugger and profile profiler modules from a command line for a script invocation 
rather than interactively, though this usage mode seems to have changed somewhat in 
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3.0 (profile appears to have been affected by the removal of execfile in 3.0, and pdb 
steps into superfluous input/output code in the new 3.0 io module): 


c:\Python30> python -m pdb main.py a b -c # Debug a script 
--Return-- 

> c:\python30\lib\io. py(762)closed()->False 

-> return self.raw.closed 

(Pdb) c 


c:\Python30> C:\python26\python -m pdb main.py a b -c # Better in 2.6? 
> c:\python30\main. py(1)<module>() 
-> import sys 


(Pdb) c 
c:\Python30> python -m profile main.py a b -c # Profile a script 
c:\Python30> python -m cProfile main.py a b -c # Low-overhead profiler 


Immediately after the “python” and before the designation of code to be run, Python 
accepts additional arguments that control its own behavior. These arguments are con- 
sumed by Python itself and are not meant for the script being run. For example, -0 runs 
Python in optimized mode, -u forces standard streams to be unbuffered, and -i enters 
interactive mode after running a script: 


c:\Python30> python -u main.py a b -c # Unbuffered output streams 


Python 2.6 supports additional options that promote 3.0 compatibility (-3, -Q) and 
detecting inconsistent tab indentation usage, which is always detected and reported in 
3.0 (-t; see Chapter 12). See the Python manuals or reference texts for more details on 
available command-line options. Or better yet, ask Python itself—run a command-line 
form like this: 


c:\Python30> python -? 


to request Python’s help display, which documents available command-line options. 
If you deal with complex command lines, be sure to also check out the standard library 
modules getopt and optparse, which support more sophisticated command-line 
processing. 


For More Help 


Python’s standard manual set today includes valuable pointers for usage on various 
platforms. The standard manual set is available in your Start button on Windows after 
Python is installed (option “Python Manuals”), and online at http:/Avww.python.org. 
Look for the manual set’s top-level section titled “Using Python” for more platform- 
specific pointers and hints, as well as up-to-date cross-platform environment and 
command-line details. 
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As always, the Web is your friend, too, especially in a field that often evolves faster than 
books like this can be updated. Given Python’s widespread adoption, chances are good 
that answers to any usage questions you may have can be found with a web search. 
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APPENDIX B 
Solutions to End-of-Part Exercises 


Part |, Getting Started 


See “Test Your Knowledge: Part I Exercises” on page 70 in Chapter 3 for the exercises. 


1. Interaction. Assuming Python is configured properly, the interaction should look 
something like the following (you can run this any way you like (in IDLE, from a 
shell prompt, and so on): 

% python 

...copyright information lines... 

>>> "Hello World!" 

"Hello World!' 

>>> # Use Ctrl-D or Ctrl-Z to exit, or close window 

2. Programs. Your code (i.e., module) file module1.py and the operating system shell 
interactions should look like this: 


print('Hello module world! ') 


% python module1.py 
Hello module world! 


Again, feel free to run this other ways—by clicking the file’s icon, by using IDLE’s 
Run>Run Module menu option, and so on. 


3. Modules. The following interaction listing illustrates running a module file by im- 
porting it: 
% python 
>>> import module1 


Hello module world! 
>>> 


Remember that you will need to reload the module to run it again without stopping 
and restarting the interpreter. The question about moving the file to a different 
directory and importing it again is a trick question: if Python generates a 
module1.pyc file in the original directory, it uses that when you import the module, 
even if the source code (.py) file has been moved to a directory not in Python’s 
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search path. The .pyc file is written automatically if Python has access to the source 
file’s directory; it contains the compiled byte code version of a module. See Chap- 
ter 3 for more on modules. 


4. Scripts. Assuming your platform supports the #! trick, your solution will look like 
the following (although your #! line may need to list another path on your 
machine): 

#! /usr/local/bin/python (or #!/usr/bin/env python) 


print('Hello module world! ') 
% chmod +x module1.py 


% module1.py 
Hello module world! 

5. Errors. The following interaction (run in Python 3.0) demonstrates the sorts of 
error messages you'll get when you complete this exercise. Really, you’re triggering 
Python exceptions; the default exception-handling behavior terminates the run- 
ning Python program and prints an error message and stack trace on the screen 
The stack trace shows where you were in a program when the exception occurred 
(if function calls are active when the error happens, the “Traceback” section dis- 
plays all active call levels). In Part VII, you will learn that you can catch exceptions 
using try statements and process them arbitrarily; you’ll also see there that Python 
includes a full-blown source code debugger for special error-detection require- 
ments. For now, notice that Python gives meaningful messages when programming 
errors occur, instead of crashing silently: 

% python 
>>> 2 ** 500 
32733906078961418700131896968275991522166420460430647894832913680961337964046745 
54883270092325904157150886684127560071009217256545885393053328527589376 
>>> 
»> 1/0 
Traceback (most recent call last): 
File "<stdin>", line 1, in <module> 
ZeroDivisionError: int division or modulo by zero 
>>> 
>>> spam 
Traceback (most recent call last): 
File "<stdin>", line 1, in <module> 
NameError: name 'spam' is not defined 


6. Breaks and cycles. When you type this code: 


L = [1, 2] 

L.append(L) 
you create a cyclic data structure in Python. In Python releases before 1.5.1, the 
Python printer wasn’t smart enough to detect cycles in objects, and it would print 
an unending stream of [1, 2, [1, 2, [1, 2, [1, 2, and so on, until you hit the 
break-key combination on your machine (which, technically, raises a keyboard- 
interrupt exception that prints a default message). Beginning with Python 1.5.1, 
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the printer is clever enough to detect cycles and prints [[...]] instead to let you 
know that it has detected a loop in the object’s structure and avoided getting stuck 
printing forever. 


The reason for the cycle is subtle and requires information you will glean in 
Part II, so this is something of a preview. Butin short, assignments in Python always 
generate references to objects, not copies of them. You can think of objects as 
chunks of memory and of references as implicitly followed pointers. When you run 
the first assignment above, the name L becomes a named reference to a two-item 
list object—a pointer to a piece of memory. Python lists are really arrays of object 
references, with an append method that changes the array in-place by tacking on 
another object reference at the end. Here, the append call adds a reference to the 
front of L at the end of L, which leads to the cycle illustrated in Figure B-1: a pointer 
at the end of the list that points back to the front of the list. 


Besides being printed specially, as you’ll learn in Chapter 6 cyclic objects must also 
be handled specially by Python’s garbage collector, or their space will remain un- 
reclaimed even when they are no longer in use. Though rare in practice, in some 
programs that traverse arbitrary objects or structures you might have to detect such 
cycles yourself by keeping track of where you’ve been to avoid looping. Believe it 
or not, cyclic data structures can sometimes be useful, despite their special-case 
printing. 


Names : Objects 


Cy 


Figure B-1. A cyclic object, created by appending a list to itself. By default, Python appends a reference 
to the original list, not a copy of the list. 


Part Il, Types and Operations 


See “Test Your Knowledge: Part II Exercises” on page 255 in Chapter 9 for the 
exercises. 


1. The basics. Here are the sorts of results you should get, along with a few comments 


about their meaning. Again, note that ; is used in a few of these to squeeze more 
than one statement onto a single line (the ; is a statement separator), and commas 
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build up tuples displayed in parentheses. Also keep in mind that the / division 
result near the top differs in Python 2.6 and 3.0 (see Chapter 5 for details), and the 
list wrapper around dictionary method calls is needed to display results in 3.0, 
but not 2.6 (see Chapter 8): 


# Numbers 

>>> 2 ** 16 # 2 raised to the power 16 

65536 

>> 2/5, 2/5.0 # Integer / truncates in 2.6, but not 3.0 


(0.40000000000000002, 0.40000000000000002 ) 
# Strings 


>>> "spam" + "eggs" # Concatenation 
' spameggs ' 

>>> S = "ham" 

>>> "eggs "+S 


"eggs ham' 

>>> S*5 # Repetition 

"hamhamhamhamham' 

>>> S[:0] # An empty slice at the front -- [0:0] 


# Empty of same type as object sliced 


>>> "green %s and %s" % ("eggs", S) | # Formatting 
"green eggs and ham' 

>>> ‘green {0} and {1}'.format('eggs', S) 

"green eggs and ham' 


# Tuples 

>>> ('x',)[o] # Indexing a single-item tuple 
oe ('x', 'y')[1] # Indexing a 2-item tuple 

ry! 

# Lists 

>>> L = [1,2,3] + [4,5,6] # List operations 


>>> L, L[:], L[:o], L[-2], L[-2:] 

([1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6], [], 5, [5, 6]) 

>>> ([1,2,3]+[4,5,6]) [2:4] 

[3, 4] 

>>> [L[2], L[3]] # Fetch from offsets; store in a list 
[3, 4] 

>>> L.reverse(); L # Method: reverse list in-place 
[6, 5, 4, 3, 2, 1] 

>>> L.sort(); L # Method: sort list in-place 

[1, 2, 3, 4, 5, 6] 

>>> L.index(4) # Method: offset of first 4 (search) 
3 


# Dictionaries 


>>> {'a':1, 'b':2}['b'] # Index a dictionary by key 
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>>> D = {'x':1, 'y':2, 'z':3} 

>>> D['w'] = 0 # Create a new entry 

>>> D['x'] + D['w'] 

$ 

>>> D[(1,2,3)] =4 # A tuple used as a key (immutable) 
>>> D 


'w': 0, 'z': 3, 


y': 2, (1, 2, 3): 4, 'x': 1} 


>>> list(D.keys()), list(D.values()), (1,2,3) in D # Methods, key test 
(['w', 'z', "y's (41, 2, 3), 'x'], [0, 3, 2, 4, 1], True) 

# Empties 

>>> [[]], ["",[],(),{},None] # Lots of nothings: empty objects 


CEL ("> (1, (O, th, None]) 


2. Indexing and slicing. Indexing out of bounds (e.g., L[4]) raises an error; Python 
always checks to make sure that all offsets are within the bounds of a sequence. 


On the other hand, slicing out of bounds (e.g., L[-1000:100]) works because Python 
scales out-of-bounds slices so that they always fit (the limits are set to zero and the 
sequence length, if required). 


Extracting a sequence in reverse, with the lower bound greater than the higher 
bound (e.g., L[3:1]), doesn’t really work. You get back an empty slice ([ ]) because 
Python scales the slice limits to make sure that the lower bound is always less than 
or equal to the upper bound (e.g., L[3:1] is scaled to L[3:3], the empty insertion 
point at offset 3). Python slices are always extracted from left to right, even if you 
use negative indexes (they are first converted to positive indexes by adding the 
sequence length). Note that Python 2.3’s three-limit slices modify this behavior 
somewhat. For instance, L[3:1:-1] does extract from right to left: 

>>> L = [1, 2, 3, 4] 

>>> L[4] 

Traceback (innermost last): 

File "<stdin>", line 1, in ? 
IndexError: list index out of range 
>>> L[-1000:100] 


[1, 2, 3, 4] 

>>> L[3:1] 

[] 

>> L 

[1, 2, 3, 4] 

>>> L[3:1] = ['?'] 
>> L 


[1, 2, 3, '?', 4] 

3. Indexing, slicing, and del. Your interaction with the interpreter should look some- 
thing like the following code. Note that assigning an empty list to an offset stores 
an empty list object there, but assigning an empty list to a slice deletes the slice. 
Slice assignment expects another sequence, or you'll get a type error; it inserts items 
inside the sequence assigned, not the sequence itself: 
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>>> L = [1,2,3,4] 
>>> L[2] = [] 
>> L 
[1, 2, [], 4] 
>>> L[2:3] = [] 
>> L 
[1, 2, 4] 
>>> del L[o] 
>> L 
[2, 4] 
>>> del L[1:] 
>> L 
[2] 
>>> L[1:2] = 1 
Traceback (innermost last): 
File "<stdin>", line 1, in ? 
TypeError: illegal argument type for built-in operation 


4. Tuple assignment. The values of X and Y are swapped. When tuples appear on the 


left and right of an assignment symbol (=), Python assigns objects on the right to 
targets on the left according to their positions. This is probably easiest to under- 
stand by noting that the targets on the left aren’t a real tuple, even though they 
look like one; they are simply a set of independent assignment targets. The items 
on the right are a tuple, which gets unpacked during the assignment (the tuple 
provides the temporary assignment needed to achieve the swap effect): 


. Dictionary keys. Any immutable object can be used as a dictionary key, including 
integers, tuples, strings, and so on. This really is a dictionary, even though some 
of its keys look like integer offsets. Mixed-type keys work fine, too: 


>>> D = {} 

>>> D[1] = 'a' 

>>> D[2] = 'b' 

>>> D[(1, 2, 3)] = 'c' 
>>> D 


{1: 'a', 2: 'b', (1, 2, 3): e 

. Dictionary indexing. Indexing a nonexistent key (D['d' ]) raises an error; assigning 
to a nonexistent key (D['d' ]='spam') creates a new dictionary entry. On the other 
hand, out-of-bounds indexing for lists raises an error too, but so do out-of-bounds 
assignments. Variable names work like dictionary keys; they must have already 
been assigned when referenced, but they are created when first assigned. In fact, 
variable names can be processed as dictionary keys if you wish (they’re made visible 
in module namespace or stack-frame dictionaries): 
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>>> D = {'a's1, 'b':2, 


"c':3} 


>>> D['a'] 


1 


>>> D['d'] 
Traceback (innermost last): 


File "<stdin>", line 1, in ? 


KeyError: d 
>>> D['d'] = 4 
>>> D 


{ ' 


bs Qo "day Tatecay tel T 


>>> 

>>> L = [0, 1] 

>>> L[2] 

Traceback (innermost last): 


File "<stdin>", line 1, in ? 


IndexError: list index out of range 
>>> L[2] = 3 
Traceback (innermost last): 


File "<stdin>", line 1, in ? 


IndexError: list assignment index out of range 


7. Generic operations. Question answers: 


e The + operator doesn’t work on different/mixed types (e.g., string + list, list + 


tuple). 


+ doesn’t work for dictionaries, as they aren’t sequences. 


e The append method works only for lists, not strings, and keys works only on 


dictionaries. append assumes its target is mutable, since it’s an in-place exten- 
sion; strings are immutable. 


e Slicing and concatenation always return a new object of the same type as the 


objects processed: 


>>> "x" +1 
Traceback (innermost last): 
File "<stdin>", line 1, in ? 


TypeError: illegal argument type for built-in operation 


>>> 
»> +0 
Traceback (innermost last): 

File "<stdin>", line 1, in ? 
TypeError: bad operand type(s) for + 
>>> 
>>> [].append(9) 
>>> "".append('s') 

Traceback (innermost last): 

File "<stdin>", line 1, in ? 
AttributeError: attribute-less object 
>>> 
>>> list({}.keys()) 


>>> [].keys() 

Traceback (innermost last): 
File "<stdin>", line 1, in ? 

AttributeError: keys 


# list needed in 3.0, not 2.6 
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>>> 


>> [][:] 
[] 


>>> ""[:] 

8. String indexing. This is a bit of a trick question—Because strings are collections of 
one-character strings, every time you index a string, you get back a string that can 
be indexed again. S[o0] [o0] [0] [0] [0] just keeps indexing the first character over and 
over. This generally doesn’t work for lists (lists can hold arbitrary objects) unless 
the list contains strings: 


>>> S = "spam" 
>>> S[o][o][o][o][o] 
s 


>>> L= ['s', 'p'] 
>>> L[o][o][o] 


S 


9. Immutable types. Either of the following solutions works. Index assignment 
doesn’t, because strings are immutable: 


>> S 
>> S 
>>> S 
'slam' 
>> S 
>> S 
'slam' 


"spam" 
S[o] + '1' + S[2:] 


S[o] + '1' + S[2] + S[3] 


(See also the Python 3.0 bytearray string type in Chapter 36—it’s a mutable sequence 
of small integers that is essentially processed the same as a string.) 


10. Nesting. Here is a sample: 


>>> me = {'name':('John', 'Q', 'Doe'), 'age':'?', ‘job':'engineer'} 
>>> me['job'] 

“engineer' 

>>> me[ ‘name’ ][2] 

"Doe' 


11. Files. Here’s one way to create and read back a text file in Python (1s is a Unix 
command; use dir on Windows): 


# File: maker.py 

file = open('myfile.txt', 'w') 

file.write('Hello file world!\n') # Or: open().write() 
file.close() # close not always needed 


# File: reader.py 
file = open('myfile.txt') # 'r' is default open mode 
print(file.read()) # Or print(open().read()) 


% python maker. py 
% python reader.py 
Hello file world! 
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% ls -1 myfile.txt 
-rwxrwxrwa 10 (0) 19 Apr 13 16:33 myfile.txt 


Part Ill, Statements and Syntax 


See “Test Your Knowledge: Part III Exercises” on page 390 in Chapter 15 for the 
exercises. 


1. Coding basic loops. As you work through this exercise, you’ll wind up with code 
that looks like the following: 
>>> S = 'spam' 


>>> for c in S: 
print (ord(c)) 


115 
112 


97 
109 


>> x =0 
>>> for c in S: x += ord(c) # Or: x =x + ord(c) 


>>> X 
433 


>> x = [] 
>>> for c in S: x.append(ord(c)) 


>>> X 
[115, 112, 97, 109] 


>>> list(map(ord, S)) # list() required in 3.0, not 2.6 
[115, 112, 97, 109] 

2. Backslash characters. The example prints the bell character (\a) 50 times; assuming 
your machine can handle it, and when it’s run outside of IDLE, you may get a series 
of beeps (or one sustained tone, if your machine is fast enough). Hey—I warned 
you. 


3. Sorting dictionaries. Here’s one way to work through this exercise (see Chapter 8 
or Chapter 14 if this doesn’t make sense). Remember, you really do have to split 
up the keys and sort calls like this because sort returns None. In Python 2.2 and 
later, you can iterate through dictionary keys directly without calling keys (e.g., 
for key in D:), but the keys list will not be sorted like it is by this code. In more 
recent Pythons, you can achieve the same effect with the sorted built-in, too: 


>>> D = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5, 'f':6, 'g':7} 


>>> D 

HO, “e's By a a, g p ere By. d'r 4y Tbs 2} 

>>> 

>>> keys = list(D.keys()) # list() required in 3.0, not in 2.6 
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>>> keys.sort() 
>>> for key in keys: 
print(key, '=>', D[key]) 


maAnAmnDans®» 
I . 
v 

NOUBWNPE 


>>> for key in sorted(D): # Better, in more recent Pythons 
print(key, '=>', D[key]) 


4. Program logic alternatives. Here’s some sample code for the solutions. For step e, 
assign the result of 2 ** X to a variable outside the loops of steps a and b, and use 
it inside the loop. Your results may vary a bit; this exercise is mostly designed to 
get you playing with code alternatives, so anything reasonable gets full credit: 


#a 


L = [1, 2, 4, 8, 16, 32, 64] 
X=5 


i=0 
while i < len(L): 
if 2 ** X == L[i]: 
print('at index', i) 
break 
i +=1 
else: 
print(X, 'not found') 


#b 


L = [1, 2, 4, 8, 16, 32, 64] 
X=5 


for p in L: 
if (2 ** X) == p: 
print((2 ** X), 'was found at', L.index(p)) 
break 
else: 
print(X, 'not found') 


#c 


L = [1, 2, 4, 8, 16, 32, 64] 
X=5 


if (2 ** X) in L: 
print((2 ** X), 'was found at', L.index(2 ** X)) 
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else: 
print(X, ‘not found') 


#d 


X= 5 

L=[] 

for i in range(7): L.append(2 ** i) 
print(L) 


if (2 ** X) in L: 

print((2 ** X), 'was found at', L.index(2 ** X)) 
else: 

print(X, 'not found') 


# f 

X=5 

L = list(map(lambda x: 2**x, range(7))) # or [2**x for x in range(7)] 
print(L) # list() to print all in 3.0, not 2.6 


if (2 ** X) in L: 

print((2 ** X), ‘was found at', L.index(2 ** X)) 
else: 

print(X, ‘not found’) 


Part IV, Functions 


See “Test Your Knowledge: Part IV Exercises” on page 524 in Chapter 20 for the 
exercises. 


1. The basics. There’s not much to this one, but notice that using print (and hence 
your function) is technically a polymorphic operation, which does the right thing 
for each type of object: 


% python 
>>> def func(x): print(x) 


>>> func("spam") 
spam 
>>> func(42) 
42 
>>> func([1, 2, 3]) 
[1, 2, 3] 
>>> func({'food': 'spam'}) 
{'food': 'spam'} 
2. Arguments. Here’s a sample solution. Remember that you have to use print to see 


results in the test calls because a file isn’t the same as code typed interactively; 
Python doesn’t normally echo the results of expression statements in files: 
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def adder(x, y): 
return x + y 


print(adder(2, 3)) 
print(adder('spam', 'eggs')) 
print(adder(['a', 'b'], ['c' 
% python mod.py 

5 

spameggs 

['a', "bs Ea 'd'] 

3. varargs. Two alternative adder functions are shown in the following file, 
adders.py. The hard part here is figuring out how to initialize an accumulator to an 
empty value of whatever type is passed in. The first solution uses manual type 
testing to look for an integer, and an empty slice of the first argument (assumed to 
be a sequence) if the argument is determined not to be an integer. The second 
solution uses the first argument to initialize and scan items 2 and beyond, much 
like one of the min function variants shown in Chapter 18. 


The second solution is better. Both of these assume all arguments are of the same 
type, and neither works on dictionaries (as we saw in Part II, + doesn’t work on 
mixed types or dictionaries). You could add a type test and special code to allow 
dictionaries, too, but that’s extra credit. 


def adder1(*args): 
print('adder1', end=' ') 


if type(args[0]) == type(0): # Integer? 
sum = 0 # Init to zero 
else: # else sequence: 
sum = args[0][:0] # Use empty slice of arg1 


for arg in args: 
sum = sum + arg 
return sum 


def adder2(*args): 
print('adder2', end=' ') 


sum = args[0] # Init to arg] 
for next in args[1:]: 

sum += next # Add items 2..N 
return sum 


for func in (adder1, adder2): 
print(func(2, 3, 4)) 
print(func('spam', 'eggs', 'toast')) 
print(func(['a', 'b'], ['c', ‘d'], ['e', 'f'])) 


% python adders.py 

adder1 9 

adder1 spameggstoast 

adder1 ['a', 'b', 'c', 'd', 'e', 'f'] 
adder2 9 

adder2 spameggstoast 

adder2 ['a', 'b', 'c', 'd', 'e', 'f'] 
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4. Keywords. Here is my solution to the first and second parts of this exercise (coded 
in the file mod.py). To iterate over keyword arguments, use the **args form in the 
function header and use a loop (e.g., for x in args.keys(): use args[x]), or use 
args.values() to make this the same as summing *args positionals: 


def adder(good=1, bad=2, ugly=3): 


return good + bad + ugly 


print (adder ()) 

print (adder (5)) 
print(adder(5, 6)) 
print(adder(5, 6, 7)) 


print(adder(ugly=7, good=6, bad=5)) 


% python mod.py 
6 


10 
14 
18 
18 


# Second part solutions 


def adder1(*args): 
tot = args[o] 


for arg in args[1:]: 


tot += arg 
return tot 


def adder2(**args): 


# Sum any number of positional args 


# Sum any number of keyword args 


argskeys = list(args.keys()) # list needed in 3.0! 

tot = args[argskeys[0]] 

for key in argskeys[1:]: 
tot += args[key] 


return tot 


def adder3(**args): 


# Same, but convert to list of values 


args = list(args.values()) # list needed to index in 3.0! 


tot = args[o] 


for arg in args[1:]: 


tot += arg 
return tot 


def adder4(**args) : 


# Same, but reuse positional version 


return adder1(*args.values()) 


print(adder1(1, 2, 3), 
print(adder2(a=1, b=2, 
print (adder3(a=1, b=2, 
print(adder4(a=1, b=2, 


c=3), 
c=3), 
c=3), 


adder1('aa', 'bb', ‘cc')) 

adder2(a='aa', b='bb', c='cc')) 
adder3(a='aa', b='bb', c='cc')) 
adder4(a='aa', b='bb', c='cc')) 
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5. (and 6.) Here are my solutions to exercises 5 and 6 (file dicts.py). These are just 
coding exercises, though, because Python 1.5 added the dictionary methods 
D.copy() and D1.update(D2) to handle things like copying and adding (merging) 
dictionaries. (See Python’s library manual or O’Reilly’s Python Pocket Reference 
for more details.) X[:] doesn’t work for dictionaries, as they’re not sequences (see 
Chapter 8 for details). Also, remember that if you assign (e = d) rather than copy- 
ing, you generate a reference to a shared dictionary object; changing d changes e, 
too: 


def copyDict(old): 
new = {} 
for key in old.keys(): 
new[key] = old[key] 
return new 


def addDict(d1, d2): 
new = {} 
for key in di.keys(): 
new[key] = di[key] 
for key in d2.keys(): 
new[key] = d2[key] 
return new 


% python 

>>> from dicts import * 
>>> d = {1: 1, 2: 2} 
>>> e = copyDict(d) 

>>> d[2] = '?' 

>>> d 

{1: 1, 2: "2" } 

>> e 

{1: 1, 2: 2} 


{1: 1} 
{2: 2} 
addDict(x, y) 


>>> 
>>> 
>>> 
>>> 


{1: 

6. See #5. 

7. More argument-matching examples. Here is the sort of interaction you should get, 
along with comments that explain the matching that goes on: 


ENNS XK 


y 2: 2} 


def f1(a, b): print(a, b) # Normal args 
def f2(a, *b): print(a, b) # Positional varargs 
def f3(a, **b): print(a, b) # Keyword varargs 


def f4(a, *b, **c): print(a, b, c) # Mixed modes 


def f5(a, b=2, c=3): print(a, b, c) # Defaults 
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def f6(a, b=2, *c): print(a, b, c) 


% python 

>>> f1(1, 2) 

12 

>>> f1(b=2, a=1) 
1.2 


>>> f2(1, 2, 3) 
1 (2, 3) 


>>> f3(1, x=2, y=3) 
{xe 2 “YS 3} 


>>> f4(1, 2, 3, x=2, y=3) 
1 (2, 3) {'x': 2, 'y': 3} 


>>> f5(1) 
123 

>>> f5(1, 4) 
143 


>>> f6(1) 


12 () 
>>> f6(1, 3, 4) 


13 (4,) 


# Defaults and positional varargs 


# Matched by position (order matters) 


# Matched by name (order doesn't matter) 


# Extra positionals collected in a tuple 


# Extra keywords collected in a dictionary 


# Extra of both kinds 


# Both defaults kick in 


# Only one default used 


non 


# One argument: matches "a 


# Extra positional collected 


8. Primes revisited. Here is the primes example, wrapped up in a function and a mod- 
ule (file primes.py) so it can be run multiple times. I added an if test to trap neg- 
atives, 0, and 1. I also changed / to // in this edition to make this solution immune 
to the Python 3.0 / true division changes we studied in Chapter 5, and to enable it 
to support floating-point numbers (uncomment the from statement and 
change // to / to see the differences in 2.6): 


#from future import division 


def prime(y): 
if y <= 1: 
print(y, ‘not prime’) 
else: 
x=y //2 
while x > 1: 
if y % x == 0: 


# For some y > 1 


# 3.0/ fails 


# No remainder? 


print(y, ‘has factor’, x) 


break 
x -= 1 
else: 
print(y, ‘is prime') 


prime(13); prime(13.0) 
prime(15); prime(15.0) 
prime(3); prime(2) 
prime(1); prime(-3) 


# Skip else 
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Here is the module in action; the // operator allows it to work for floating-point 
numbers too, even though it perhaps should not: 

% python primes.py 

13 is prime 

13.0 is prime 

15 has factor 5 

15.0 has factor 5.0 

3 is prime 

2 is prime 

1 not prime 

-3 not prime 


This function still isn’t very reusable—it could return values, instead of printing 
—but it’s enough to run experiments. It’s also not a strict mathematical prime 
(floating points work), and it’s still inefficient. Improvements are left as exercises 
for more mathematically minded readers. (Hint: a for loop over range(y, 1, -1) 
may be a bit quicker than the while, but the algorithm is the real bottleneck here.) 
To time alternatives, use the built-in time module and coding patterns like those 
used in this general function-call timer (see the library manual for details): 
def timer(reps, func, *args): 

import time 

start = time.clock() 

for i in range(reps): 

func(*args) 
return time.clock() - start 
9. List comprehensions. Here is the sort of code you should write; I may have a pref- 

erence, but I’m not telling: 


>>> values = [2, 4, 9, 16, 25] 
>>> import math 


>>> res = [] 
>>> for x in values: res.append(math.sqrt(x)) 


>>> res 
[1.4142135623730951, 2.0, 3.0, 4.0, 5.0] 


>>> list(map(math.sqrt, values) ) 
[1.4142135623730951, 2.0, 3.0, 4.0, 5.0] 


>>> [math.sqrt(x) for x in values] 
[1.4142135623730951, 2.0, 3.0, 4.0, 5.0] 


10. Timing tools. Here is some code I wrote to time the three square root options, along 
with the results in 2.6 and 3.0. The last result of each function is printed to verify 
that all three do the same work: 


# File mytimer.py (2.6 and 3.0) 
...same as listed in Chapter 20... 


# File timesqrt.py 
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import sys, mytimer 


reps = 10000 
repslist = range(reps) # Pull out range list time for 2.6 
from math import sqrt # Not math.sqrt: adds attr fetch time 


def mathMod(): 
for i in repslist: 
res = sqrt(i) 

return res 


def powCall(): 
for i in repslist: 
res = pow(i, .5) 
return res 


def powExpr(): 
for i in repslist: 
res = i ** 15 

return res 


print(sys.version) 
for tester in (mytimer.timer, mytimer.best): 
print('<%s>' % tester. name_) 
for test in (mathMod, powCall, powExpr): 
elapsed, result = tester(test) 
print ('-'*35) 
print ('%s: %.5f => %s' % 
(test. __name_, elapsed, result)) 


Following are the test results for Python 3.0 and 2.6. For both, it looks like the 
math module is quicker than the ** expression, which is quicker than the pow call; 
however, you should try this with your code and on your own machine and version 
of Python. Also, note that Python 3.0 is nearly twice as slow as 2.6 on this test; 3.1 
or later might perform better (time this in the future to see for yourself): 

c:\misc> c:\python30\python timesqrt.py 


3.0.1 (1301:69561, Feb 13 2009, 20:04:18) [MSC v.1500 32 bit (Intel) ] 
<timer> 


<best> 


powExpr: 0.00540 => 99.994999875 


c:\misc> c:\python26\python timesqrt.py 
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2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit (Intel)] 
<timer> 


powExpr: 3.12502 => 99.994999875 
<best> 


powExpr: 0.00287 => 99.994999875 


To time the relative speeds of Python 3.0 dictionary comprehensions and equivalent 
for loops interactively, run a session like the following. It appears that the two are 
roughly the same in this regard under Python 3.0; unlike list comprehensions, 
though, manual loops are slightly faster than dictionary comprehensions today 
(though the difference isn’t exactly earth-shattering—at the end we save half a 
second when making 50 dictionaries of 1,000,000 items each). Again, rather than 
taking these results as gospel you should investigate further on your own, on your 
computer and with your Python: 


c:\misc> c:\python30\python 
>>> 
>>> def dictcomp(I): 
return {i: i for i in range(I)} 


>>> def dictloop(I): 
new = {} 
for i in range(I): new[i] = i 
return new 


>>> dictcomp(10) 

{0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9} 
>>> dictloop(10) 

{0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9} 
>>> 

>>> from mytimer import best, timer 

>>> best(dictcomp, 10000) [0] # 10,000-item dict 
0.0013519874732672577 

>>> best(dictloop, 10000) [0] 

0.001132965223233029 

>>> 

>>> best(dictcomp, 100000) [0] # 100,000 items: 10 times slower 
0.01816089754424155 

>>> best(dictloop, 100000) [0] 

0.01643484018219965 


>>> 

>>> best(dictcomp, 1000000) [0] # 1,000,000 items: 10X time 
0.18685105229855026 

>>> best(dictloop, 1000000) [0] # Time for making one dict 
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0.1769041177020938 


>>> 

>>> timer(dictcomp, 1000000, _reps=50)[0] # 1,000,000-item dict 
10.692516087938543 

>>> timer(dictloop, 1000000, _reps=50)[0] # Time for making 50 


10.197276050447755 


Part V, Modules 


See “Test Your Knowledge: Part V Exercises” on page 605 in Chapter 24 for the 
exercises. 


1. Import basics. This one is simpler than you may think. When you’re done, your 
file (mymod.py) and interaction should look similar to the following; remember 
that Python can read a whole file into a list of line strings, and the len built-in 
returns the lengths of strings and lists: 

def countLines (name): 


file = open(name) 
return len(file.readlines()) 


def countChars (name): 
return len(open(name).read()) 


def test(name): # Or pass file object 
return countLines(name), countChars (name) # Or return a dictionary 
% python 


>>> import mymod 
>>> mymod.test('mymod.py') 
(10, 291) 


Note that these functions load the entire file in memory all at once, so they won’t 
work for pathologically large files too big for your machine’s memory. To be more 
robust, you could read line by line with iterators instead and count as you go: 
def countLines (name): 
tot = 0 


for line in open(name): tot += 1 
return tot 


def countChars (name): 
tot = 0 
for line in open(name): tot += len(line) 
return tot 


On Unix, you can verify your output with a wc command; on Windows, right-click 
on your file to view its properties. But note that your script may report fewer char- 
acters than Windows does—for portability, Python converts Windows \r\n line- 
end markers to \n, thereby dropping one byte (character) per line. To match byte 
counts with Windows exactly, you have to open in binary mode ('rb'), or add the 
number of bytes corresponding to the number of lines. 
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Incidentally, to do the “ambitious” part of this exercise (passing in a file object so 
you only open the file once), you’ll probably need to use the seek method of the 
built-in file object. We didn’t cover it in the text, but it works just like C’s fseek 
call (and calls it behind the scenes): seek resets the current position in the file to a 
passed-in offset. After a seek, future input/output operations are relative to the new 
position. To rewind to the start of a file without closing and reopening it, call 
file.seek(0); the file read methods all pick up at the current position in the file, 
so you need to rewind to reread. Here’s what this tweak would look like: 
def countLines(file): 


file.seek(0) # Rewind to start of file 
return len(file.readlines()) 


def countChars(file): 
file.seek(0) # Ditto (rewind if needed) 
return len(file.read()) 


def test(name): 
file = open(name) # Pass file object 
return countLines(file), countChars(file) # Open file only once 


>>> import mymod2 
>>> mymod2.test("mymod2.py") 
(11, 392) 


2. from/from *. Here’s the from * part; replace * with countChars to do the rest: 
% python 
>>> from mymod import * 
>>> countChars("mymod. py") 
291 
3. _main_. If you code it properly, it works in either mode (program run or module 
import): 
def countLines (name): 


file = open(name) 
return len(file.readlines()) 


def countChars (name): 
return len(open(name).read()) 


def test(name): # Or pass file object 
return countLines(name), countChars (name) # Or return a dictionary 
if _name__ == '_main_': 


print(test('mymod.py')) 


% python mymod.py 
(13, 346) 


This is where I would probably begin to consider using command-line arguments 
or user input to provide the filename to be counted, instead of hardcoding it in the 
script (see Chapter 24 for more on sys.argv, and Chapter 10 for more on input): 
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if _name__ == '' main_': 
print(test(input('Enter file name:')) 


if _name_ == ''  main_': 
import sys 
print(test(sys.argv[1])) 
4. Nested imports. Here is my solution (file myclient.py): 
from mymod import countLines, countChars 
print(countLines('mymod.py'), countChars('mymod.py')) 


% python myclient.py 
13 346 


As for the rest of this one, mymod’s functions are accessible (that is, importable) from 
the top level of myclient, since from simply assigns to names in the importer (it 
works almost as though mymod’s defs appeared in myclient). For example, another 
file can say this: 


import myclient 
myclient.countLines(...) 


from myclient import countChars 
countChars(...) 


If myclient used import instead of from, you’d need to use a path to get to the 
functions in mymod through myclient: 


import myclient 
myclient.mymod.countLines(...) 


from myclient import mymod 
mymod.countChars(...) 


In general, you can define collector modules that import all the names from other 
modules so they’re available in a single convenience module. Using the following 
code, you wind up with three different copies of the name somename (mod1.somename, 
collector.somename, and _main__.somename); all three share the same integer ob- 
ject initially, and only the name somename exists at the interactive prompt as is: 


# File mod1.py 
somename = 42 


# File collector.py 

from modi import * # Collect lots of names here 
from mod2 import * # from assigns to my names 
from mod3 import * 


>>> from collector import somename 


5. Package imports. For this, I put the mymod.py solution file listed for exercise 3 into 
a directory package. The following is what I did to set up the directory and its 
required __init__.py file ina Windows console interface; you’ll need to interpolate 
for other platforms (e.g., use mv and vi instead of move and edit). This works in any 
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directory (I just happened to run my commands in Python’s install directory), and 
you can do some of this from a file explorer GUI, too. 


When I was done, I had a mypkg subdirectory that contained the files 
__init__.py and mymod.py. You need an __init__.py in the mypkg directory, but 
not in its parent; mypkg is located in the home directory component of the module 
search path. Notice how a print statement coded in the directory’s initialization 
file fires only the first time it is imported, not the second: 

C:\python30> mkdir mypkg 

C:\Python30> move mymod.py mypkg\mymod. py 

C:\Python30> edit mypkg\__init__.py 

...coded a print statement... 

C:\Python30> python 

>>> import mypkg.mymod 

initializing mypkg 

>>> mypkg.mymod.countLines('mypkg\mymod.py' ) 

13 

>>> from mypkg.mymod import countChars 

>>> countChars('mypkg\mymod.py' ) 

346 

6. Reloads. This exercise just asks you to experiment with changing the changer.py 

example in the book, so there’s nothing to show here. 


7. Circular imports. The short story is that importing recur2 first works because the 
recursive import then happens at the import in recur1, not at a from in recur2. 


The long story goes like this: importing recur2 first works because the recursive 
import from recur1 to recur2 fetches recur2 as a whole, instead of getting specific 
names. recur2 is incomplete when it’s imported from recur1, but because it uses 
import instead of from, you’re safe: Python finds and returns the already created 
recur2 module object and continues to run the rest of recur1 without a glitch. 
When the recur2 import resumes, the second from finds the name Y in recur (it’s 
been run completely), so no error is reported. Running a file as a script is not the 
same as importing it as a module; these cases are the same as running the first 
import or from in the script interactively. For instance, running recur as a script 
is the same as importing recur2 interactively, as recur2 is the first module imported 
in recur1. 


Part VI, Classes and OOP 


See “Test Your Knowledge: Part VI Exercises” on page 816 in Chapter 31 for the 
exercises. 


1. Inheritance. Here’s the solution code for this exercise (file adder.py), along with 
some interactive tests. The _add__ overload has to appear only once, in the su- 
perclass, as it invokes type-specific add methods in subclasses: 


class Adder: 
def add(self, x, y): 
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print('not implemented! ') 

def _init_(self, start=[]): 
self.data = start 

def _add_ (self, other): # Or in subclasses? 
return self.add(self.data, other) # Or return type? 


class ListAdder (Adder): 
def add(self, x, y): 
return xX + y 


class DictAdder (Adder): 
def add(self, x, y): 
new = {} 


for k in x.keys(): new[k] = x[k] 
for k in y.keys(): new[k] = y[k] 
return new 

% python 


>>> from adder import * 
>>> x = Adder() 

>>> x.add(1, 2) 

not implemented! 

>>> x = ListAdder() 

>>> x.add([1], [2]) 

[1, 2] 

>>> x = DictAdder() 

>>> x.add({1:1}, {2:2}) 
{1: 1, 2: 2} 


>>> x = Adder([1]) 

>>> x + [2] 

not implemented! 

>>> 

>>> x = ListAdder([1]) 

>>> x + [2] 

[1, 2] 

>>> [2] +x 

Traceback (innermost last): 
File "<stdin>", line 1, in ? 

TypeError: _add_ nor _radd__ defined for these operands 


Notice in the last test that you get an error for expressions where a class instance 
appears on the right of a +; if you want to fix this, use _radd__ methods, as de- 
scribed in “Operator Overloading” in Chapter 29. 


If you are saving a value in the instance anyhow, you might as well rewrite the 
add method to take just one argument, in the spirit of other examples in this part 
of the book: 


class Adder: 
def init__(self, start=[]): 
self.data = start 
def _add_ (self, other): # Pass a single argument 
return self.add(other) # The left side is in self 
def add(self, y): 
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print('not implemented! ') 


class ListAdder (Adder): 
def add(self, y): 
return self.data + y 


class DictAdder (Adder): 
def add(self, y): 
pass # Change to use self.data instead of x 


x = ListAdder([1, 2, 3]) 
y=xt [4, 5, 6] 
print(y) # Prints [1, 2, 3, 4, 5, 6] 


Because values are attached to objects rather than passed around, this version is 
arguably more object-oriented. And, once you’ve gotten to this point, you’ll prob- 
ably find that you can get rid of add altogether and simply define type-specific 
__add__ methods in the two subclasses. 


2. Operator overloading. The solution code (file mylist.py) uses a few operator over- 
loading methods that the text didn’t say much about, but they should be straight- 
forward to understand. Copying the initial value in the constructor is important 
because it may be mutable; you don’t want to change or have a reference to an 
object that’s possibly shared somewhere outside the class. The__ getattr__ method 
routes calls to the wrapped list. For hints on an easier way to code this in Python 
2.2 and later, see “Extending Types by Subclassing” on page 775 in Chapter 31: 


class MyList: 

def _ init__(self, start): 
#self.wrapped = start[:] # Copy start: no side effects 
self.wrapped = [] # Make sure it's a list here 
for x in start: self.wrapped.append(x) 

def _add_ (self, other): 
return MyList(self.wrapped + other) 

def _mul_ (self, time): 
return MyList(self.wrapped * time) 

def _ getitem_(self, offset): 
return self.wrapped[offset] 

def _len_ (self): 
return len(self.wrapped) 

def _getslice_ (self, low, high): 
return MyList(self.wrapped[ low: high]) 

def append(self, node): 
self .wrapped.append(node) 

def _ getattr__(self, name): # Other methods: sort/reverse/etc 
return getattr(self.wrapped, name) 

def _repr_ (self): 
return repr(self.wrapped) 


if _name__ == '_main_': 
x = MyList('spam') 
print(x) 
print(x[2]) 


print(x[1:]) 
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print(x + ['eggs']) 

print(x * 3) 

x.append(‘a') 

x.sort() 

for c in x: print(c, end=' ') 


% python mylist.py 

['s', ‘p's ‘a's 'm'] 

a 

['p', nay 'm'] 

['s', ‘p's Ma's ims 'eggs'] 

['s', ‘p's, ‘a’ ‘m', ez "p's, at 'm', Ean ‘p's vals 'm'] 
aamps 


Note that it’s important to copy the start value by appending instead of slicing here, 
because otherwise the result may not be a true list and so will not respond to 
expected list methods, such as append (e.g., slicing a string returns another string, 
not a list). You would be able to copy a MyList start value by slicing because its 
class overloads the slicing operation and provides the expected list interface; how- 
ever, you need to avoid slice-based copying for objects such as strings. Also, note 
that sets are a built-in type in Python today, so this is largely just a coding exercise 
(see Chapter 5 for more on sets). 


3. Subclassing. My solution (mysub.py) appears below. Your solution should be 
similar: 


from mylist import MyList 


class MyListSub(MyList): 
calls = 0 # Shared by instances 


def _init_ (self, start): 
self.adds = 0 # Varies in each instance 
MyList. init__(self, start) 


def _add_ (self, other): 
MyListSub.calls += 1 # Class-wide counter 
self.adds += 1 # Per-instance counts 
return MyList. add (self, other) 


def stats(self): 
return self.calls, self.adds # All adds, my adds 


if _name__ == '_main_': 
x = MyListSub(' spam’ ) 
y = MyListSub('foo' ) 
print(x[2]) 
print(x[1:]) 
print(x + ['eggs']) 
print(x + ['toast']) 
print(y + ['bar']) 
print(x.stats()) 


% python mysub.py 
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a 

['p', nal y 'm'] 

['s', ‘p's ‘a's tms 'eggs'] 
['s', ‘p's ‘a"; 'm', 'toast'] 
LFS Or 5 ‘o'*" 'bar'] 

(3, 2) 


4. Metaclass methods. I worked through this exercise as follows. Notice that in Python 
2.6, operators try to fetch attributes through _getattr__, too; you need to return 
a value to make them work. Caveat: as noted in Chapter 30, _ getattr__ is not 
called for built-in operations in Python 3.0, so the following expression won’t work 
as shown; in 3.0, a class like this must redefine _X__ operator overloading methods 
explicitly. More on this in Chapters 30, 37, and 38. 


>>> class Meta: 
def _ getattr__(self, name): 
print('get', name) 
def __setattr_(self, name, value): 
print('set', name, value) 


>>> x = Meta() 

>>> X.append 

get append 

>>> x.Spam = "pork" 
set spam pork 

>>> 

>>> x +2 

get _ coerce _ 

Traceback (innermost last): 
File "<stdin>", line 1, in ? 
TypeError: call of non-function 

>>> 

>>> x[1] 

get _ getitem_ 

Traceback (innermost last): 
File "<stdin>", line 1, in ? 

TypeError: call of non-function 


>>> x[1:5] 
get _ len _ 
Traceback (innermost last): 
File "<stdin>", line 1, in ? 
TypeError: call of non-function 
5. Set objects. Here’s the sort of interaction you should get. Comments explain which 
methods are called: 
% python 
>>> from setwrapper import Set 


>>> x = Set([1, 2, 3, 4]) # Runs __init__ 
>>> y = Set([3, 4, 5]) 


>>> x &y # __and_., intersect, then __repr__ 
4 


>> x | y # __or_, union, then __repr__ 
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Set:[1, 2, 3, 4, 5] 


>>> z = Set("hello") # __init__ removes duplicates 
>>> z[o], z[-1] # __ getitem__ 
( ' h' 5 1o! ) 


>>> for c in z: print(c, end=' ') #__getitem_ 
helo 

>>> len(z), z #_len_, _repr_ 
(4, Set:['h', 'e', GA 'o']) 


>>> z & "mello", z | "mello" 
(Set:['e', "1", ‘o'], Set:['h', ter; y osy 'm']) 


My solution to the multiple-operand extension subclass looks like the following 
class (file multiset.py). It only needs to replace two methods in the original set. The 


class’s documentation string explains how it works: 


from setwrapper import Set 


class MultiSet(Set): 

Inherits all Set names, but extends intersect 
and union to support multiple operands; note 
that "self" is still the first argument (stored 
in the *args argument now); also note that the 
inherited & and | operators call the new methods 
here with 2 arguments, but processing more than 
2 requires a method call, not an expression: 


nun 


def intersect(self, *others): 


res = [] 
for x in self: # Scan first sequence 
for other in others: # For all other args 
if x not in other: break # Item in each one? 
else: # No: break out of loop 
res.append(x) # Yes: add item to end 


return Set(res) 


def union(*args): # self is args[0] 
res = [] 
for seq in args: # For all args 
for x in seq: # For all nodes 
if not x in res: 
res.append(x) # Add new items to result 


return Set(res) 


Your interaction with the extension will look something like the following. Note 
that you can intersect by using & or calling intersect, but you must call 
intersect for three or more operands; & is a binary (two-sided) operator. Also, note 
that we could have called MultiSet simply Set to make this change more transpar- 


ent if we used setwrapper Set to refer to the original within multiset: 
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>>> from multiset import * 
>>> x = MultiSet([1,2,3,4]) 
>>> y = MultiSet([3,4,5]) 
>>> z = MultiSet([0,1,2]) 


>>> x&y, x | y # Two operands 
(Set:[3, 4], Set:[1, 2, 3, 4, 5]) 


>>> x.intersect(y, z) # Three operands 
Set:[] 

>>> x.union(y, z) 

Set:[1, 2, 3, 4, 5, 0] 


>>> x.intersect([1,2,3], [2,3,4], [1,2,3]) # Four operands 
Set:[2, 3] 
>>> x.union(range(10) ) # Non-MultiSets work, too 


Set:[1, 2, 3, 4, 0, 5, 6, 7, 8, 9] 

6. Class tree links. Here is the way I changed the lister classes, and a rerun of the test 
to show its format. Do the same for the dir-based version, and also do this when 
formatting class objects in the tree climber variant: 

class ListInstance: 


def _str_ (self): 
return '<Instance of %s(%s), address %s:\n%s>' % ( 


self._class_.__name_, # My class's name 
self. _supers(), # My class's own supers 
id(self), # My address 
self. _attrnames()) ) # name=value list 
def _attrnames(self): 
...unchanged... 
def __supers(self): 
names = [] 
for super in self. class_.__bases_: # One level up from class 
names.append(super.__name_ ) # name, not str(super) 
return ', '.join(names) 


C:\misc> python testmixin. py 

<Instance of Sub(Super, ListInstance), address 7841200: 
name datai=spam 
name data2=eggs 
name data3=42 

> 


7. Composition. My solution is below (file lunch.py), with comments from the de- 
scription mixed in with the code. This is one case where it’s probably easier to 
express a problem in Python than it is in English: 

class Lunch: 
def _ init__(self): # Make/embed Customer, Employee 


self.cust = Customer() 
self.empl = Employee() 


def order(self, foodName): # Start Customer order simulation 
self.cust.placeOrder(foodName, self.emp1) 
def result(self): # Ask the Customer about its Food 


self.cust.printFood() 
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class Customer: 
def _ init__(self): # Initialize my food to None 
self.food = None 
def placeOrder(self, foodName, employee): # Place order with Employee 
self.food = employee. takeOrder (foodName) 
def printFood(self): # Print the name of my food 
print (self. food.name) 


class Employee: 
def takeOrder(self, foodName): # Return Food, with desired name 
return Food(foodName) 


class Food: 
def _ init__(self, name): # Store food name 
self.name = name 


if _name_ == '  main_': 
= Lunch() # Self-test code 
.order('burritos') # If run, not imported 


-order('pizza') 


x 
x 
x.result() 
x 
x.result() 


% python lunch. py 

burritos 

pizza 

8. Zoo animal hierarchy. Here is the way I coded the taxonomy in Python (file 

z00.py); it’s artificial, but the general coding pattern applies to many real structures, 
from GUIs to employee databases. Notice that the self. speak reference in Animal 
triggers an independent inheritance search, which finds speak in a subclass. Test 
this interactively per the exercise description. Try extending this hierarchy with 
new classes, and making instances of various classes in the tree: 

class Animal: 


def reply(self): self.speak() # Back to subclass 
def speak(self): print('spam') # Custom message 


class Mammal(Animal): 
def speak(self): print('huh?') 


class Cat(Mammal): 
def speak(self): print('meow') 


class Dog(Mamma1) : 
def speak(self): print('bark') 


class Primate(Mammal) : 
def speak(self): | print('Hello world! ') 


class Hacker(Primate): pass # Inherit from Primate 
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9. The Dead Parrot Sketch. Here’s how [implemented this one (file parrot.py). Notice 
how the line method in the Actor superclass works: by accessing self attributes 
twice, itsends Python back to the instance twice, and hence invokes two inheritance 
searches—self.name and self.says() find information in the specific subclasses: 


class Actor: 
def line(self): print(self.name + ':', repr(self.says())) 


class Customer(Actor): 
name = ‘customer’ 
def says(self): return "that's one ex-bird!" 


class Clerk(Actor): 
name = ‘clerk’ 
def says(self): return "no it isn't..." 


class Parrot(Actor): 
name = 'parrot' 
def says(self): return None 


class Scene: 
def _ init__(self): 
self.clerk = Clerk() # Embed some instances 
self.customer = Customer() # Scene is a composite 
self.subject = Parrot() 


def action(self): 
self.customer.line() # Delegate to embedded 
self.clerk.line() 
self.subject.line() 


Part VII, Exceptions and Tools 


See “Test Your Knowledge: Part VII Exercises” on page 891 in Chapter 35 for the 
exercises. 


1. try/except. My version of the oops function (file oops.py) follows. As for the 
noncoding questions, changing oops to raise a KeyError instead of an IndexError 
means that the try handler won’t catch the exception (it “percolates” to the top 
level and triggers Python’s default error message). The names KeyError and Index 
Error come from the outermost built-in names scope. Import builtins 
(__builtin__ in Python 2.6) and pass it as an argument to the dir function to see 
for yourself: 


def oops(): 
raise IndexError() 


def doomed(): 
try: 
oops() 
except IndexError: 
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print('caught an index error! ') 
else: 
print('no error caught...') 


if _name_ == '_main_': doomed() 
% python oops.py 
caught an index error! 


2. Exception objects and lists. Here’s the way I extended this module for an exception 
of my own: 


class MyError(Exception): pass 


def oops(): 
raise MyError(‘Spam!') 


def doomed(): 
try: 
oops() 
except IndexError: 
print('caught an index error!') 
except MyError as data: 
print('caught error:', MyError, data) 
else: 
print('no error caught...') 


if _name__ == '_main_': 


doomed () 


% python oops.py 
caught error: <class 


__main__.MyError'> Spam! 


Like all class exceptions, the instance comes back as the extra data; the error mes- 
sage shows both the class (<...>) and its instance (Spam!). The instance must be 
inheriting both an _init__ anda _repr_ or _str__ from Python’s Exception 
class, or it would print like the class does. See Chapter 34 for details on how this 
works in built-in exception classes. 


3. Error handling. Here’s one way to solve this one (file safe2.py). I did my tests in a 
file, rather than interactively, but the results are about the same. 


import sys, traceback 


def safe(entry, *args): 
try: 
entry(*args) # Catch everything else 
except: 
traceback.print_exc() 
print('Got', sys.exc_info()[0], sys.exc_info()[1]) 


import oops 
safe(oops.oops) 


% python safe2.py 
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Traceback (innermost last): 
File "safe2.py", line 5, in safe 
entry(*args) # Catch everything else 
File "oops.py", line 4, in oops 
raise MyError, ‘world’ 
hello: world 
Got hello world 


4. Here are a few examples for you to study as time allows; for more, see follow-up 
books and the Web: 


# Find the largest Python source file in a single directory 


import os, glob 
dirname = r'C:\Python30\Lib' 


allsizes = [] 
allpy = glob.glob(dirname + os.sep + '*.py') 
for filename in allpy: 
filesize = os.path.getsize(filename) 
allsizes.append((filesize, filename) ) 


allsizes.sort() 
print(allsizes[:2]) 
print(allsizes[-2:]) 


# Find the largest Python source file in an entire directory tree 


import sys, os, pprint 
if sys.platform[:3] == 'win': 
dirname = r'C:\Python30\Lib' 
else: 
dirname = '/usr/lib/python' 


allsizes = [] 
for (thisDir, subsHere, filesHere) in os.walk(dirname): 
for filename in filesHere: 
if filename.endswith('.py'): 
fullname = os.path.join(thisDir, filename) 
fullsize = os.path.getsize(fullname) 
allsizes.append((fullsize, fullname) ) 


allsizes.sort() 
pprint.pprint(allsizes[:2]) 
pprint.pprint(allsizes[-2:]) 


# Find the largest Python source file on the module import search path 


import sys, os, pprint 
visited = {} 
allsizes = [] 
for srcdir in sys.path: 
for (thisDir, subsHere, filesHere) in os.walk(srcdir): 
thisDir = os.path.normpath(thisDir) 
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if thisDir.upper() in visited: 
continue 
else: 
visited[thisDir.upper()] = True 
for filename in filesHere: 
if filename.endswith('.py'): 
pypath = os.path.join(thisDir, filename) 
try: 
pysize = os.path.getsize(pypath) 
except: 
print('skipping', pypath) 
allsizes.append((pysize, pypath)) 


allsizes.sort() 
pprint.pprint(allsizes[:3]) 
3:] 


pprint.pprint(allsizes[-3:]) 


# Sum columns in a text file separated by commas 


filename = 'data.txt' 
sums = {} 


for line in open(filename): 
cols = line.split(',' 
nums = [int(col) for col in cols] 
for (ix, num) in enumerate(nums): 
sums[ix] = sums.get(ix, 0) + num 


for key in sorted(sums): 
print(key, '=', sums[key]) 


# Similar to prior, but using lists instead of dictionaries for sums 


import sys 
filename = sys.argv[1] 
numcols = int(sys.argv[2]) 
totals = [0] * numcols 
for line in open(filename): 
cols = line.split(',' 
nums = [int(x) for x in cols] 
totals = [(x + y) for (x, y) in zip(totals, nums)] 


print(totals) 


# Test for regressions in the output of a set of scripts 

import os 

testscripts = [dict(script='test1.py', args=''), # Or glob script/args dir 
dict(script='test2.py', args='spam')] 


for testcase in testscripts: 
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commandline = '%(script)s %(args)s' % testcase 
output = os.popen(commandline).read() 
result = testcase['script'] + '.result' 
if not os.path.exists(result): 
open(result, ‘w').write(output) 
print('Created:', result) 
else: 
priorresult = open(result).read() 
if output != priorresult: 
print('FAILED:', testcase['script']) 
print (output) 
else: 
print('Passed:', testcase['script']) 


# Build GUI with tkinter (Tkinter in 2.6) with buttons that change color and grow 


from tkinter import * # Use Tkinter in 2.6 

import random 

fontsize = 25 

colors = ['red', 'green', ‘blue’, ‘yellow’, ‘orange’, ‘white’, 'cyan', ‘purple’ ] 


def reply(text): 
print (text) 
popup = Toplevel() 
color = random.choice(colors) 
Label(popup, text='Popup', bg='black', fg=color).pack() 
L.config(fg=color) 


def timer(): 
L.config(fg=random.choice(colors)) 
win.after(250, timer) 


def grow(): 
global fontsize 
fontsize += 5 
L.config(font=('arial', fontsize, ‘italic')) 
win.after(100, grow) 


win = Tk() 

L = Label(win, text='Spam', 
font=(‘arial', fontsize, 'italic'), fg='yellow', bg='navy', 
relief=RAISED) 

L.pack(side=TOP, expand=YES, fi11=BOTH) 

Button(win, text='press', command=(lambda: reply('red'))).pack(side=BOTTOM, fill=X) 

Button(win, text='timer', command=timer).pack(side=BOTTOM, fill=X) 

Button(win, text='grow', command=grow).pack(side=BOTTOM, fill=X) 

win.mainloop() 


# Similar to prior, but use classes so each window has own state information 


from tkinter import * 
import random 
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class MyGui: 


nnn 


A GUI with buttons that change color and make the label grow 


nnn 


colors = ['blue', 'green', 'orange', 'red', 'brown', 'yellow'] 


def _init__(self, parent, title='popup'): 
parent.title(title) 
self.growing = False 
self.fontsize = 10 
self.lab = Label(parent, text='Gui1', fg='white', bg='navy') 
self.lab.pack(expand=YES, fill=BOTH) 
Button(parent, text='Spam', command=self.reply).pack(side=LEFT) 
Button(parent, text='Grow', command=self.grow).pack(side=LEFT) 
Button(parent, text='Stop', command=self.stop).pack(side=LEFT) 


def reply(self): 
"change the button's color at random on Spam presses" 
self.fontsize += 5 
color = random.choice(self.colors) 
self.lab.config(bg=color, 
font=('courier', self.fontsize, ‘bold italic')) 


def grow(self): 
"start making the label grow on Grow presses" 
self.growing = True 
self.grower() 


def grower(self): 
if self.growing: 
self.fontsize += 5 
self.lab.config(font=('courier', self.fontsize, 'bold')) 
self.lab.after(500, self.grower) 


def stop(self): 
"stop the button growing on Stop presses" 
self.growing = False 


class MySubGui(MyGui) : 
colors = ['black', 'purple'] # Customize to change color choices 


MyGui(Tk(), 'main') 
MyGui (Toplevel()) 
MySubGui (Toplevel() ) 
mainloop() 


# Email inbox scanning and maintenance utility 


nun 


scan pop email box, fetching just headers, allowing 
deletions without downloading the complete message 


nun 


import poplib, getpass, sys 
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mailserver = 'your pop email server name here’ # pop.rmi.net 
mailuser ‘your pop email user name here’ # brian 
mailpasswd = getpass.getpass('Password for %s?' % mailserver) 


print(‘Connecting...') 

server = poplib.POP3(mailserver) 
server .user(mailuser) 
server.pass_(mailpasswd) 


try: 

print(server.getwelcome()) 

msgCount, mboxSize = server.stat() 

print('There are', msgCount, ‘mail messages, size ', mboxSize) 

msginfo = server.list() 

print (msginfo) 

for i in range(msgCount) : 
msgnum = i+1 
msgsize = msginfo[1][i].split()[1] 
resp, hdrlines, octets = server.top(msgnum, 0) # Get hdrs only 
print('-'*80) 
print('[%d: octets=%d, size=%s]' % (msgnum, octets, msgsize)) 
for line in hdrlines: print(line) 


if input('Print?') in ['y', 'Y']: 

for line in server.retr(msgnum)[1]: print(line) # Get whole msg 
if input('Delete?') in ['y', 'Y']: 

print(‘deleting') 


server .dele(msgnum) # Delete on srvr 
else: 
print(' skipping’) 
finally: 
server .quit() # Make sure we unlock mbox 
input('Bye.') # Keep window up on Windows 


# CGI server-side script to interact with a web browser 


#! /usr/bin/python 


import cgi 

form = cgi.FieldStorage() # Parse form data 
print("Content-type: text/html\n") # hdr plus blank line 
print ("<HTML>") 

print("<title>Reply Page</title>") # HTML reply page 


print ("<BODY>") 
if not ‘user’ in form: 
print("<h1>Who are you?</h1>") 
else: 
print("<h1>Hello <i>%s</i>!</h1>" % cgi.escape(form[ ‘user'].value)) 
print("</BODY></HTML>") 


# Database script to populate and query a MySql database 


from MySQLdb import Connect 
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conn = Connect(host='localhost', user='root', passwd='darling') 
curs = conn.cursor() 
try: 
curs.execute('drop database testpeopledb' ) 
except: 
pass # Did not exist 


curs.execute('create database testpeopledb' ) 
curs.execute('use testpeopledb' ) 
curs.execute('create table people (name char(30), job char(10), pay int(4))') 


curs.execute('insert people values (%s, %s, %S)', ('Bob', ‘dev', 50000) ) 
curs.execute('insert people values (%s, %s, %s)', ('Sue', ‘dev', 60000) ) 
curs.execute('insert people values (%s, %s, %s)', (‘Ann', 'mgr', 40000)) 


curs.execute('select * from people' ) 
for row in curs.fetchall(): 
print (row) 


curs.execute('select * from people where name = %s', ('Bob',)) 
print (curs.description) 
colnames = [desc[0] for desc in curs.description] 
while True: 

print('-' * 30) 

row = curs. fetchone() 

if not row: break 

for (name, value) in zip(colnames, row): 

print('%s => %s' % (name, value)) 


conn. commit () # Save inserted records 


# Database script to populate a shelve with Python objects 
# see also Chapter 27 shelve and Chapter 30 pickle examples 


rec1 = {'name': {'first': 'Bob', ‘last’: 'Smith'}, 
‘job': ['dev', 'mgr'], 


‘age’: 40.5} 

rec2 = {'name': {'first': 'Sue', ‘last’: 'Jones'}, 
'job': ['mgr'], 
‘age’: 35.0} 


import shelve 

db = shelve.open('dbfile') 
db['bob'] = rec1 

db['sue'] = rec2 
db.close() 


# Database script to print and update shelve created in prior script 


import shelve 
db = shelve.open('dbfile') 
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for key in db: 
print(key, '=>', db[key]) 


bob = db['bob'] 
bob['age'] += 1 
db["bob'] = bob 
db.close() 
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Symbols 

= and == (equality operators), 244 

* (repetition) operator, 200 

@ symbol, 804 

\ (backslash), 270, 318 

\ (backslash) escape sequences, 85 

& (bitwise AND operator), 108 

| (bitwise or operator), 108 

^ (bitwise XOR operator), 108 

: (colon), 264, 387 

{ } (curly braces), 78, 108, 269 
dictionaries and, 90, 208 
set comprehensions and, 137 
sets and, 135, 221 

/ and // (division operators), 108, 110 
(see also division) 

" (double quotes) and strings, 158 

... (ellipses), 330 

= and == (equality operators), 108, 151 

#! (hash bang), 46 

# (hash character), 43, 376 

>, >=, <, <= (magnitude comparison 

operators), 108 

— (minus operator), 108 

* (multiplication operator), 108 

() (parentheses), 265, 269, 318 
functions and, 389 
generator expressions and, 497 
tuples and, 96 

+ (plus operator), 108, 200 

"Nu..." and "\U..." escapes, 910 

% (remainder operator), 108 

; (semicolon), 265, 269 

>> and << (shift operators), 108 


Index 


' (single quotes) and strings, 158 

[ ] (square brackets), 78, 108, 269 
dictionaries and, 209 
list comprehensions and, 359, 486, 504 
lists and, 89, 199 

_ (underscore), 584 

__add__ method, 634 

__all__ variable, 584 

__bases__ attribute, 697, 699 

__bool__ method, 730 

__call__ method, 725 
function interfaces and, 727 

__class__ attribute, 697, 699 

__cmp__ method (Python 2.6), 729 

__contains__ method, 716 

__del__ method, 732 

__delattr__ method, 956 

__delete__ method, 950 

__dict__ attribute, 550 

__doc__ attribute, 377, 701 

__enter__ method, 854 

__eq__ method, 729 

__exit__ method, 854 

__get__ method, 706, 948 

__getattr__ method, 718, 814, 942, 956-973 
computed attributes, 961 
delegation using, 745 
delegation-based managers, 970 
example, 959 
interception of built-in attributes, 966 
loops, avoiding in interception methods, 

958 

__getattribute__, compared to, 962 

__getattribute__ method, 794, 942, 956-973 
computed attributes, 961 
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delegation-based managers, 970 
example, 959 


interception of built-in operation attributes, 


966 
loops, avoiding in attribute interception, 
958 
__getattr__, compared to, 962 
__getitem__ method, 708 
index iteration, 710 
membership, 716 
__gt__ method, 728 
__iadd__ method, 723 
__ init__.py files, 563 
__init__ method, 634, 644, 706 
__iter__ method, 711 
design purpose, 713 
membership, 716 
__len__ method, 730 
— lt method, 728 
__main__ attribute 
__name__ attribute of modules and, 585 
__main__ module, 409 
__metaclass__ variable (Python 2.6), 1063 
__name__ attribute, 585, 647 
command-line arguments with, 587 
unit tests, 586 
__ne__ method, 729 
__next__ method, 352, 711 
__radd__ method, 723 
__repr__ method, 721 
custom exception display using, 867 
__set__ method, 706, 949 
__setattr__ method, 719, 942, 956 
__setitem__ method, 709 
__slots__ attribute, 767, 788 
descriptors and, 956 
__dict__ attribute and, 1026 
__str__ method, 634, 721 
custom exception display using, 867 
overload method for printing objects, 652 
__sub__ method, 706 


A 


abs function, 125 
absolute imports, 570 
abstract superclasses, 690-693 
example, 742 
Python 2.6 and 3.0, 692 
accessor functions, 417 


ActivePython, 1090 
annotation information, 472 
anonymous functions, 474 
anydbm module (Python 2.6), 670 
append method, 87, 203, 388 
apply built-in (Python 2.6), 449 
arbitrary arguments examples 
apply built-in (Python 2.6), 449 
applying functions generically, 448 
collecting arguments, 446 
unpacking arguments, 447 
arguments, 435 
argument passing basics, 435—440 
mutable argument changes, avoiding, 
438 
output parameters, simulating, 439 
shared references, 436 
argument-matching modes, 440-453 
arbitrary arguments examples, 446—450 
available modes, 441 
defaults, 445 
keyword-only arguments (Python 3.0), 
450 
keywords, 444 
keywords and defaults combined, 446 
matching syntax, 442 
ordering rules, 443 
emulating Python 3.0 print in earlier 
versions, 457 
keyword-only arguments, 459 
generalized set functions, 456 
keyword arguments, 460 
min wakeup call, 453 
three ways of coding, 454 
using max instead of min, 455 
ArithmeticError class, 865 
as extension for import and from, 591 
ASCII character code, 897 
coding ASCII text, 905 
assert statement, 691, 850 
trapping constraints example, 851 
AssertionError exception, 850 
assignment 
import, from, and def, 546 
mutables in, 388 
within function classes, 409 
assignment statements, 263, 279-291 
assignment statement forms, 280 
augmented assignments, 289 
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sequence assignments, 281—284 


extended sequence unpacking in Python 


3.0, 284 

multiple-target assignments, 288 
associative arrays, 207 
as_integer_ratio method, 108 
attribute fetches, 173 
attribute interception methods, 1053 
attribute tree construction, 687 
attributes, 53, 531, 543, 644 


managed attributes (see managed attributes) 


automatic memory management, 15 


base indicators, 107 
BaseException class, 864 
basic numeric literals, 106 
basic statement form, 280 
beginners’ mistakes, 387 
behavior methods, 648 
binary files, 98, 233, 901 
binary numeric literals, 107 
binary-mode files, 920 
in Python 3.0, 921 
bit_length method (Python 3.1), 108 
blank lines, 314, 388 
block delimiters, 315 
blocks, 314 
BOM (byte order marker), 901 
Python 3.0, handling in, 926-928 
book update websites, xlv 
bool type, 248 
Boolean numeric type, 139 
Boolean object type, 100 
Boolean operators, 320-324 
Booleans in Python 2.6, 731 
bound methods, 728, 750 
other callable objects, compared to, 754 
break statement, 329, 331 
bsddb extension module, 672 
built-in exception classes, 864-867 
categories, 865 
class hierarchy, 864 
default printing and state, 866 
built-in mathematical functions, 108 
built-in object types, 15, 75-78 
additional core types, 99-103 
dictionaries, 90—96, 207-223 
files, 97, 229-239 


issues to be aware of, 251 
assignment creates references, 251 
cyclic data structures, 252 
immutable types, 253 
repetition adds one level deep, 252 
lists, 86-90, 197 
numbers, 78 
object classifications, 240 
sets, 99 
shared properties, 239 
strings, 80-86 
tuples, 96, 225-229 
type, 100 
built-in scope, 412 
builtins module, 126, 412 
byte code, 7 
compilation, 26 
byte order marker (see BOM) 
bytearray, 157 
object type, using, 917—920 
bytearray string type, 899 
bytes, 157 
bytes object, 896 
data encoding in, 901 
literals, 908 
bytes string type, 85, 899 


C 


C code, 388 
call expressions, 173 
calls, 400, 403 
character encoding schemes, 897 
character set encoding declarations, 912 
chmod command, 46 
class attribute descriptors, 1053 
class decorators, 984, 990 
coding, 1011 
decorators versus manager functions, 
1018 
retaining multiple instances, 1016 
singleton classes, 1011 
tracing object interfaces, 1013—1016 
implementation, 990 
justification, 1019 
metaclasses, compared to, 1056, 1073— 
1076, 1080 
private attributes, implementing, 1023- 
1026 
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public attributes, implementing, 1026— 
1030 
supporting multiple instances, 992 
usage, 990 
class methods, 686, 795, 800 
counting instances, 802 
counting per class, 803 
justification, 795 
using, 799 
class properties, 1053 
class statement, 611, 681—684, 1061 
example, 682-684 
general form, 681 
classes, 611, 614, 615, 619 
abstract superclasses, 690-693 
as attributes of modules, 631 
built-in types, extending, 773-777 
embedding, 774 
subclassing, 775-777 
class decorators, 807 
class hierarchies, 629 
class instances, 626 
class method calls, 616 
class methods (see class methods) 
class statements, 616 
class trees, 613, 616-619 
classic classes, 778 
coding, 643-675 
behavior methods, 648 
class statement, 681—684 
composition, delegation, and 
embedding, 660 
constructors, customizing, 658-663 
databases, storing objects in, 669-675 
docstrings, 701 
inheritance, 687-693 
introspection, 663—669 
making instances, 644—648 
methods, 649, 684—686 
modules, versus, 703 
namespaces, 693-701 
OOP concepts embodied in, 660 
operator overloading, 651-653 
subclassing, 653—658 
dependencies and function design, 464 
dictionaries, versus, 639 
distinctions of, 612 
exception classes (see exception classes) 
frameworks, 621 


function decorators, 804-808 
gotchas, 808 
changing class attributes, 808 
changing mutable class attributes, 810 
delegation-based classes (Python 3.0), 
814 
methods, classes, and nested scopes 
(Python 2.2 and before), 812 
multiple inheritance, 811 
overwrapping, 814 
inheritance, customization by, 629 
instances, generation of, 625-629 
interception of Python operators, 633-636 
justification, 612 
metaclasses, 781, 794, 807 
as namespace objects, 638 
naming conventions, 644 
“new-style” classes, 777-795 
changes, 778-787 
persistence and, 744 
properties of, 626 
simplest class, 636-640 
static and class methods, 795 
subclasses and superclasses, 614 
user-defined classes, 101 
classic division, 110, 117 
classmethod function, 799 
classtree function, 700 
close method, 231 
closure function, 420 
code reuse 
modules and, 530 
OOP and, 619-621 
code reuse and code redundancy, 395 
codecs.open call (Python 2.6), 912 
cohesion, 463 
collections (see lists) 
colon (:), 387 
command line (see interactive prompt) 
command-line arguments, 587 
comments, 43, 314, 376 
companies using Python, 8 
comparison methods, 728 
comparison operators, 728 
comparisons in Python 3.0, 204 
compiled extensions, 7 
complex numbers, 107 
component integration, 10 
composites, 661 
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composition, 612, 740-745 
stream processing with, 742 
compound statements, 264, 311 
general pattern, 314 
comprehension syntax, 507—509 
concatenation, 81 
constructor method, 634 
__init__, 644 
constructors 
coding, 644 
customizing, 658—663 
context managers, 854 
file and server connection closure, 879 
continue statement, 329, 331 
control flow statements, 314 
conversionflag, 185 
copy module, nested data structures, copying 
with, 244 
copying versus referencing of objects, 241 
core data types, 77, 648 
count method and tuples, 228 
coupling, 463 
CPython, 29 
cross-file module linking, 532 
cross-file name changes, 547 
curly braces { }, 78, 269 
dictionaries and, 90, 208 
set comprehensions and, 137 
sets and, 135, 221 
CWD (current working directory), 576 
cyclic references, 147 
Cygwin, 1090 
Cython, 33 


D 


data attributes, 682 
data hiding in modules, 583 
data structures, 76 
database programming, 11 
databases, 676 
storing objects in, 669-675 
pickles and shelves, 670-675 
dbm module, 670 
debuggers, 888 
debugging, 67 
assert statement, 850 
trapping constraints example, 851 
outer try statements, using for, 879 
decimal module, 127 


decimal numeric literals, 107 
decimal numeric type, 99, 127—129 
decoding and encoding, 898 
decorators, 983—995, 1053 
call and instance management, 984 
class decorators, 990-992 
coding, 1011—1020 
decorator arguments, 994 
versus function annotations, 1043 
function decorators, 986—990 
coding, 996-1011 
functions and classes, managing, 984, 995, 
1021 
open issues, 1030-1034 
private and public attributes, 1023 
justification, 985 
nesting, 993 
type testing with, 1045 
using and defining, 984 
def statement, 407 
default exception handler, 827 
definitions, 400, 402 
del statement, 87 
delegation, 661, 720, 745 
descriptor protocol, 942 
descriptors, 947—956 
descriptor methods, 948 
method arguments, 948 
read-only descriptors, 949 
__delete__ method, 950 
__get__ method, 948 
__set__ method, 949 
__slots__ implementation by, 956 
design patterns, 621 
destructor method, 732 
developer community, 12 
development tools, 886-890 
Python toolset hierarchy, 886 
diamond pattern of multiple inheritance trees, 
783 
dictionaries, 207—223 
basic operations, 209 
changing in place, 210 
classes, versus, 639 
coding of, 208 
common literals and operations, 208 
items method, 211 
languages table example, 212 
pop method, 211 
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Python 3.0 comparisons, 246 
Python 3.0, changes in, 217 
dictionary comprehensions, 218 
dictionary magnitude comparisons, 222 
dictionary views, 219 
dictionary views and sets, 221 
sorting dictionary keys, 222 
use of in method instead of has_key, 
223 
update method, 211 
usage notes, 213 
missing-key errors, avoiding, 214 
records, using as, 215 
simulating flexible lists, 213 
sparse data structures, using for, 214 
values method, 211 
ways of making dictionaries, 216 
dictionary comprehensions, 507 
dictionary object type, 90-96 
mapping operations, 90 
missing keys and if tests, 95 
nesting, 91 
sorting keys and for loops, 93 
dictionary view iterators, 370 
dir function, 84, 376, 550, 698 
mix-in classes, listing inherited attributes of, 
761 
direct or indirect recursion, 467 
disutils, 540, 889 
division, 110, 117-121 
Python 2.6 and Python 3.0 compared, 114 
docstr.py, 701 
docstrings, 113, 314, 377, 701, 887 
built-in docstrings, 379 
docstring standards, 379 
user-defined docstrings, 378 
doctest, 887 
documentation, 375-387 
dir function, 376 
docstrings (see docstrings) 
hash-mark comments, 376 
PyDoc, 380-385 
reference books, 387 
standard manual set, 386 
web resources, 387 
DOM parsing, 935 
dotted path, 562 
double quotes (") and strings, 158 
dynamic typing, 15, 78, 143-147 


garbage collection, 146 
objects, 144 

versus variables, 145 
polymorphism and, 153 
references, 145 

shared references, 148-152 
variables, 144 


E 


Easter egg, 5 
EBCDIC encoding, 907 
Eclipse, 63 
ElementTree package, 934 
elif (else if) clause, 96, 311 
ellipses (...), 330 
else clause, 96, 837 
(see also for statement; try statement; while 
statement) 
Emacs, 65 
embedded calls, 64 
embedding contrasted with inheritance, 661 
empty strings, 155 
encapsulation, 620, 649 
encoding and decoding, 898 
encodings module, 898 
end-of-line characters, 921 
Enthought Python Distribution, 1090 
enumerate function, 348, 363 
env program, 47 
equality, testing for, 244 
error checking 
Python compared to C, 832 
error handling, 826 
etree package, 935 
eval function, 235 
event notification, 826 
except clause, 837 
(see also try statement) 
empty clauses, 838, 883 
Exception class, 865 
built-in exceptions and system exit events, 
884 
exception classes, 857—870 
advantages, 857 
built-in exception classes, 864-867 
categories, 865 
default printing and state, 866 
hierarchies, 864 
coding, 859 
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custom data and behavior, 868-870 
providing exception details, 868 
providing exception methods, 869 

custom print displays, 867 

defining handler methods, 869 

exception hierarchies, 859 
justification, 861-864 

exceptions, 825 

assert statement, 850 
trapping constraints example, 851 

catching built-in exceptions example, 841 

catching exceptions, 828 

class-based exceptions, 859 
(see also exception classes) 

for closing files and server connections, 

878 

default behavior, 840 

default exception handlers, 827 

design tips and gotchas, 882-885 
handler specificity and class-based 

categories, 885 
limiting handler generality, 883 
wrappers, 882 

exception handlers, 826 
nested exception handlers, 873-877 

in-process testing with, 880 

justification, 825 

nonetror exceptions, 877—878 
user-defined exceptions, 878 

purposes, 826 

raise statement, 848-850 

raising exceptions, 829 

string exceptions, deprecation of, 858 

termination actions, 830 

try statement (see try statement) 

typical uses for, 877-882 

user-defined exceptions, 830 

with/as statement, 851-855 
context management protocol, 853 
usage, 852 

exec function, 57 

loading modules from a string, 594 
exec statement (Python 2.6), 263 
executable files 

creating with Python, 32 

Unix path, defining in comment, 47 

executable scripts, 46 
execution optimization tools, 30 
exercises, xliii 


Part I, 70 
Part II, 255 
Part III, 390 
Part IV, 524 
Part V, 605 
Part VI, 816 
Part VII, 891 
expression operators, 108 
table of, including precedence, 108 
versions 3.0 and 2.x differences, 110 
expression statements, 295 
in-place changes, 296 
expressions, 75, 108 
mixing operators, 111 
parentheses and, 111 
extend method, 205 
extended slicing, 167 
extensions in Python versions 2.6 and 3.0, 
XXXV 


F 
factories, 768-769 
justification, 769 
factoring of code, 649 
factory design pattern, 768 
factory functions, 420 
false and true values, 246 
fieldname, 185 
file execution, 25 
file icon clicks, 47—51 
limitations, 50 
file input/output, Python 3.0, 900 
file iterators, 352 
file object methods and printing operations, 
297 
file object type, 97 
files, 225, 229-239 
advanced file methods, 238 
common operations, 230 
examples of usage, 232-238 
file context managers, 238 
packed binary data, storing and parsing 
in files, 237 
storing and parsing of Python objects, 
234 
text and binary files, Python 3.0, 233 
file iterators, 233 
mode string argument for opening, 901 
opening, 230 
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pickle, 236 
using, 231 
filter, 363 
filter function, 481 
filter iterator, 368 
finally clause, 837, 842 
(see also try statement) 
find method, 83 
fixed-precision floating-point values, 127 
floating point numbers, 106 
floor division, 110, 117 
flush method, 232 
for loop 
iterator, as an example of, 351 
line-by-line iteration with __next__ 
method, 353 
versus while and range, 388 
for statement, 327, 334-341 
examples, 335 
extended sequence unpacking in, 338 
format, 334 
nested for loops, 339 
tuple assignment in, 336 
format function, 187 
format method, 184, 185 
formats.py, 587 
formatspec, 186 
formatting, 83 
fraction number object type, 99 
Fraction numeric type, 129-133 
conversions, 131 
frameworks, 621 
freeze, 32 
from clause (raise statement), 849 
from statement, 52, 53, 545 
as assignment, 546 
equivalence to import, 548 


from imports and reload statement, 601 


interactive testing, 602 
import statement, versus, 56 
name copying without linking, 600 
pitfalls, 548-549 

corruption of namespaces, 548 


reload statement, when used with, 548 


when import is required, 549 
variables and, 601 


_ (underscore) prefix and __all__ variable, 


584 
from __future__ statement, 571 


from_float method, 131 
frozen binaries, 32, 65, 889 
frozenset built-in call, 137 
function argument-matching forms, 442 
function attributes, 431 
function calls, 616 
function decorators, 804—808, 984, 986 
basics, 804 
coding, 996—1020 
adding arguments, 1008-1011 
decorating class methods, 1001—1006 
state information retention, 997—1001 
timing calls, 1006—1008 
tracing calls, 996 
example, 805 


function arguments, validating, 1034-1046 
generalizing for keywords and defaults, 


1037 
implementation details, 1040 
open issues, 1042 
range-tester for positional arguments, 
1035 
implementation, 987 
properties of managed attributes, coding 
with, 946 
supporting method decoration, 989 
usage, 986 
function introspection, 1041 
functional programming, 481 
functions, 395-399 
attributes and annotations, 469-474 
calls, 400, 403 
coding, 396-399 
definitions, 400, 402 
dependencies and function design, 464 
design concepts, 463 
example, definitions and calls, 400 


example, intersecting sequences, 402—404 


local variables, 404 
function annotations (Python 3.0), 472 
function attributes, 471 
function instrospection, 470 
function related statements and 

expressions, 395 

global statement (see global statement) 
gotchas, 518-522 


default arguments and mutable objects, 


520 
enclosing scope loop variables, 522 
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functions without returns, 522 
static detection of local names, 518 
indirect function calls, 469 
lambda expression (see lambda expression) 
local scope, 408 
mapping over sequences, 479 
nonlocal statement (see nonlocal statement) 
parentheses and, 389 
polymorphism, 401, 403 
purpose of, 396 
recursive functions, 465—469 
arbitrary structures, handling, 468 
coding alternatives, 466 
loop statements, versus, 467 
summation, 465 
return statement (see return statement) 
simple functions, 796 
yield statement (see yield statement) 


G 


garbage collection, 92, 146 
generator expressions, 492, 497, 764 
generator functions, 492-506 
examples, 494 
generator expressions, versus, 498 
iteration protocol and, 493 
iteration tools 
coding a map(func, ...), 501 
coding zip(...) and map(None, ...), 502 
emulating zip and map functions, 500- 
505 
one-shot iterations, 505 
send method and __next__, 496 
state suspension, 493 
value generation in built-in types and 
classes, 506 
generator objects, 348 
generators, 89, 499 
get method, 96 
getrefcount function, 152 
global scope, 408 
access without the global statement, 418 
global statement, 409, 414-418 
minimize cross-file changes, 416 
minimize global variables, 415 
Google’s Unladen Swallow project, 33 
GUIs (Graphical User Interfaces), 9, 675 


H 

handlers, 828 

“has-a” relationships, 740 

hash bang (#!), 46 

hash character (#), 43 

hash tables, 208 

hash-mark comments, 376 
hashes, 207 

has_key method (Python 2.x), 96 
help function, 84, 380, 887 
helper functions, 1054 
hexadecimal numeric literals, 107 
home directory, 536 


IDEs, 63, 888 
IDLE (see IDLE user interface) 
IDLE user interface, 58—63 
getting support on Linux, 1094 
IDLE debugger, 62 
source code, creation and editing in, 60 
startup in Windows and Unix-like systems, 
58 
usage and pitfalls, 60 
if clause, 89 
if statement, 96, 311-314 
examples, 312 
format, 311 
multiway branching, 312 
if/else ternary expression, 321 
immutability, 82 
immutable objects, 253 
implementation of shared services and data, 
530 
implementation-related types, 77 
import hooks, 540 
import statement, 51, 532, 539, 544 
.py file extension and, 45 
as assignment, 546 
cross-file name changes, 547 
enabling new language features, 584 
from statement, equivalence to, 548 
from statement, versus, 56 
usage notes, 56 
imports, 533, 546 
in expressions, 313 
in membership expression, 95 
in-place addition, 725 
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in-place change operations, 388 
incremental prototyping, 645 
indentation, 266—269, 314, 388 
rules, 315 
tabs versus spaces, 317 
index method, 206 
and tuples, 228 
indexing, 165, 166 
indexing expressions, 80 
indirect function calls, 469 
infinite loops, 328 
inheritance, 612, 613, 629-632, 687-693 
abstract superclasses, 690-693 
attribute inheritance, key ideas of, 629 
attribute trees, 687 
class interface techniques, 689 
real-world relationships, modeling with, 
739 
simplicity of inheritance model, 636 
specializing inherited methods, 687 
input function, 49 
insert method, 87, 206 
installing Python, 23 
instance methods, 800 
instances, 614, 615, 625, 626, 643 
making instances, 644-648 
coding constructors, 644 
incremental testing, 645 
test code, 646 
as namespace objects, 638 
int, 169 
int function, 235 
integer division, Python 2.6 versus 3.0, 115 
integers, 106 
Python 3.0, 107 
integrated development environments (see 
IDEs) 
interactive loops, 271-276 
math operations on user input, 272 
nesting code three levels deep, 275 
simple example, 271 
testing inputs, 273 
try statements, handling errors with, 274 
interactive prompt, 35—41 
exiting a session, 37 
experimenting with code, 38 
files, running from, 43 
multiline statements, entering, 41 
testing code, 39 


tips for using, 39 
Internet scripting, 10 
interpreters, 23 
introspection, 591 
introspection attributes, 1053 
IronPython, 30, 1091 
is operator, 244 
“js-a” relationships, 739 
is_integer method, 108 
items method, 211, 370 
iter function, 354 
iteration, 485 
built-in tools for, 362 
manual iteration, 354 
iteration protocol, 94, 351, 352, 493 
iterators, 351-358 
additional built-in iterators, 356 
file iterators, 352 
filter, 368 
generator functions (see generator 
functions) 
map, 368 
in Python 3.0, 366-371 
range, 367 
support for multiple iterators, 369 
range function, 342 
timing iteration alternatives, 509-518 
other suggestions, 517 
time module, 509 
time module alternatives, 513 
timing results, 511 
timing script, 510 
zip, 368 
iters.py, 712 


J 


JIT Gust-in-time) compilation, 31 
jump tables, 476 
Jython, xlv, 29, 1091 


K 

keys, 93 

keys method, 370 

keyword arguments, 204, 460, 646 

keyword-only arguments (Python 3.0), 450 
justification, 453 
ordering rules, 452 

Komodo, 63 
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L 


lambda expression, 474-479 
basics, 474 


defining inline callback functions in tkinter, 


479 
justification for, 475 
nested lambdas and scopes, 478 
potential for code obfuscation, 477 
ambdas and nested scopes, 422 
Latin-1 character encoding, 897 
LEGB rule, 410 
en function, 80 
exical scoping, 408 
Linux Python command line, starting, 36 
ist comprehension expressions, 88 
ist comprehensions, 351, 358-362, 485 
basics, 359 
best uses of, 490 
extended syntax, 361 
files, using on, 359 
map function and, 491 
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text-mode files, 920 
text.py, 912 
threenames.py, 54 
time module, 509 
alternatives, 513 
timeit module, 517 
timer module, keyword-only arguments, 516 
tkinter, 59 
getting support on Linux, 1094 
settings, 1094 
top-level code, 387 
top-level file, 51, 531 
transitive module reloads, 595—598 
triple quotes, 162 
True and False, 414 
True and False Boolean values, 139 
true and false values, 246 
true division, 110, 117 
truth tests, 320 
try 
except statement, 831 
try statement, 96, 263, 826, 840 
(see also exceptions) 
debugging with, 879 
except statement and, 828 
nested try statements, 873—877 
Python 2.5 and later, 835 
try 
except/else, 835-842 
try statement clause forms, 837-839 
try/else clause, 839 
try/finally statement, 842-843 
coding termination actions, 843 
unified try/except/finally, 844-847 
example, 846 
nesting finally and except, 845 
statement syntax, 845 
try/finally statement, 827, 830 
file and server connection closure, 879 
tuple object type, 96 
tuple-unpacking assignment statements, 280 
tuples, 114, 225-229 
common literals and operations, 226 
conversions, methods, and immutability, 
228 
in for loops, 336 
immutability and tuple contents, 229 
lists, compared to, 229 
sorting, 228 


supported sequence operations, 227 
syntax with parentheses and commas, 227 
type class, 1061 
type hierarchies, 248 
type object type, 100, 250, 1058 
typesubclass.py, 775 


U 
unbound methods, 750, 796 
Python 3.0 status as functions, 752 
undefined name exception, 691 
underscore (_), 584 
Unicode, 897 
strings, coding of, 904 
text, handling in versions 2.6 and 3.0, 896 
Unicode files, 924 
reading and writing (Python 3.0), 924 
decoding mismatches, 925 
file input decoding, 925 
file output encoding, 925 
manual encoding, 924 
unicode string type (Python 2.6), 911 
unicode string type (Python 2.x), 899, 910 
unicode strings, 157 
union function, 456 
unit tests with __name__ attribute, 586 
unittest, 887 
Unix 
env lookup trick, 47 
executable scripts, 46 
Python command line, starting, 36 
Unladen Swallow project, 33 
update method, 211 
user base of Python language, 7 
user-defined classes, 101 
user-defined exceptions, 830 
UTF-8 encoding, 898 
utility modules, 108 


V 
values method, 211, 370 
van Rossum, Guido, 14 
variables, 113, 144-145 
declaration, 114 
initialization, 546 
local variables, 404 
scope, 408 
variable name rules, 292-295 
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websites, 676 

while loop, 94 
versus for loops, 388 

while statement, 327 
range function and, 342 

Windows 
automatic file extensions, 45 
executable files, displaying output, 49 
icon clicks for program initiation, 47 
IDLE user interface and, 58 
program files, opening with icons, 47 
Python command line, starting in, 36 
Python files, running in, 44 
Python standard manual set, 386 

Windows Notepad, file encoding specification, 

926 

with statement, 129, 842 

with/as extension, 263 

with/as statement, 832, 851-855 
context management protocol, 853 
usage, 852 

wrapper classes, 745 

wrapper objects, 984 

wrappers, catching exceptions with, 882 

write method, 232 
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XML, 934 
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yield expression, 263 

yield operator, 108 

yield statement, 397, 399 
usage in generators, 493 


Z 
zip, 363, 365 
zip function, 345 
dictionary construction using, 347 
zip iterator, 368 
ZODB object-oriented database system, 676 
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